For some years, and especially during the last six months before the Bulldozer launch in 2011, AMD has continuously made various claims about its performance, always stressing that it will crush Sandy Bridge, due to having a double number of "cores".
Immediately after launch, all the AMD claims were proved to be shameless lies. There is absolutely no excuse for AMD. Intel had published accurate information about the performance of Sandy Bridge, one year in advance. Even without the advance publication, the performance of Sandy Bridge could have been easily extrapolated from the evolution of Penryn, Nehalem and Westmere.
The performance of Bulldozer was determined by design decisions taken by AMD at least 3 to 4 years before its launch. Therefore, during the last 6 months, when they have lied the most, they knew perfectly well that they were lying and it should have been obvious to them that this is futile, because the lies will be exposed by independent benchmarks immediately after launch.
For most operations, a so-called Bulldozer "module" (with 2 "cores") had exactly the same execution resources as a Sandy Bridge core, while for a few operations, like integer multiplication, a Bulldozer "module" had even less execution resources than a Sandy Bridge core.
The common FPU of 2 Bulldozer "cores" was more or less equivalent with the FPU of a single Sandy Bridge core.
The Bulldozer integer "cores" were stripped down in comparison with the cores of the previous AMD CPUs, e.g. a new "core" had 2/3 of the addition throughput and 1/2 of the multiplication throughput of an old core.
So 2 Bulldozer integer "cores" were together barely faster than one old AMD core, while a single Sandy Bridge core was faster than them, by having an equal addition throughput, but a double multiplication throughput.
Bulldozer could have been a decent CPU, if only AMD, instead of describing it as a 4-"module"/8-"core" CPU, would have described it as a 4-core/8-thread CPU, which matched Sandy Bridge, except that in Sandy Bridge the 2 threads of a core shared dynamically most execution resources, while in Bulldozer only a part of the execution resources were shared dynamically, e.g. the FPU, while the rest of the execution resources were allocated statically to the 2 threads of a core, which is less efficient.
Such a description would have set correctly the expectations of the users, and they would not have felt cheated.
I am pretty certain that the huge disappointment that has accompanied the Bulldozer launch has hurt much more the AMD sales than they might have gained from the buyers lured by the false advertising about the 8-core monster AMD CPUs, which should easily beat the puny 4-core Intel CPUs.
(author here) The FPU is not quite equivalent to a Sandy Bridge FPU, but the FPU is one of the strongest parts of the Bulldozer core. Also, iirc multiply throughput is the same on Bulldozer and K10 at 1 per cycle. K10 uses two pipes to handle high precision multiply instructions that write to two registers, possibly because each pipe only has one result bus and writing two regs requires using both pipes's write ports. But that doesn't mean two multiplies can complete in a single cycle.
With regard to expectations, I don't think AMD ever said that per-thread performance would be a match for Sandy Bridge. ST performance imo was Bulldozer's biggest problem. Calling it 8 cores or 4 cores does not change that. You could make a CPU with say, eight Jaguar cores, and market it as an eight core CPU. It would get crushed by 4c/8t Zen despite the core count difference and no sharing of core resources between threads (on Jaguar).
All AMD CPUs since the first Opteron in 2003 until before Bulldozer had a 64-bit integer multiplication throughput of 1 per 2 clock cycles.
Initially the Intel CPUs had a much lower throughput, but they improved in each generation, until they matched AMD in Nehalem.
In Sandy Bridge, Intel doubled the 64-bit integer multiplication throughput to 1 at each clock cycle.
On the other hand, AMD reduced in Bulldozer the 64-bit integer multiplication throughput to 1 per 4 clock cycles.
The FPU of Bulldozer had the additional advantage of implementing FMA, but the total throughput in FP multiplications + additions of the 4 Bulldozer FPUs was equal to the total throughput of the 4 Sandy Bridge FPUs.
While you are right that calling Bulldozer a 4-core CPU does not change the user expectations about ST performance, it totally changes the user expectations about MT performance.
In 2011, the people were not as shocked about the low ST performance (though they were surprised that it was lower than in the AMD Barcelona derivatives), because that was a given ever since Intel had introduced Core 2, as they were shocked about the low MT performance, seeing that an "8-core" CPU is trounced by a 4-core CPU.
After being exposed to the AMD propaganda, it was expected that Bulldozer was unlikely to match Intel in ST performance, but it should have a consistent advantage in MT performance, due to being "8-core".
Yeah you're right about the multiplication performance. I checked back and 64-bit integer multiplication is one per four clocks.
I disagree that core count should be taken to mean anything about MT performance. You always have to consider the strength of each core too. Nor does twice as many cores for the same architecture imply 2x performance, because there are always shared things like cache and memory bandwidth. And even if those aren't limiting factors, MT boost clocks are often lower than ST ones.
Huh, so the benchmarks that show that a FX-8xxx is about on par to i3 for ST, but even better than a i5 for MT are misleading ??
I actually have no complaints about MT performance (even though learning that it wasn't a "real" 8-core was disappointing), I can even run VR because it takes full advantage of MT, even though ST is often struggling for other tasks...
The benchmarks that you have in mind had probably been run on some later Piledriver/Steamroller/Excavator models, which corrected some of the initial problems, like the too low instruction decoding throughput, and which raised the clock frequency, due to an improved CMOS SOS process.
I also had an AMD Richland APU of 4.4 GHz, which was reasonably faster for most tasks than an Intel Haswell U i5, but the speed ratio was much, much less than the power consumption ratio of 100 W for AMD vs. 15 W for Intel.
The original Bulldozer fared much worse against Sandy Bridge.
Hmm, but AFAIK the performance didn't radically improve in P/S/E - and anyway, I had assumed that this whole discussion also covered them because B/P/S/E are still all using the same architecture - for instance : doesn't TFA apply to P/S/E ?
> So 2 Bulldozer integer "cores" were together barely faster than one old AMD core
This sounds like a wild claim : I went from a better than midrange 3-core Phenom to a worse than midrange ~~8~~ 4 core Bulldozer, and my singlethreaded performance not only did not nearly halve, but has even improved !
EDIT : By Bulldozer, I mean Excavator (I assume that performance wasn't radically different, especially when compared to the Intel equivalents released at the same time, but I might be wrong ?)
Gaming and enthusiast machines are only a fraction of AMD's market, most consumers and clients didn't care about the features AMD's marketing department lied about.
AMD did not lie about some particular feature of Bulldozer.
They lied by claiming that Bulldozer should be faster than Sandy Bridge, while knowing that it is completely impossible for their claims to be true.
Even without running any benchmark, it was trivial to predict that at similar clock frequencies the "8-core" Bulldozer must be slower than the 4-core Sandy Bridge.
For multi-threaded performance, the complete Bulldozer had between 50% and 100% of the number of arithmetic units of Sandy Bridge, so in the best case it could match but not exceed Sandy Bridge.
For single-threaded integer performance, one Bulldozer "core" had between 25% and 50% of the number of arithmetic units of Sandy Bridge, so in the best case it could be only half as fast as Sandy Bridge.
The only chance for Bulldozer to beat Sandy Bridge as in the AMD claims would have been a very high clock frequency, in the 5 to 7 GHz range.
However, AMD never made the false claim that Bulldozer will beat Sandy Bridge by having a much higher clock frequency. They claimed that they will beat Intel by having more "cores", which implied more arithmetic units, while knowing very well that they had decided to remove a large part of the arithmetic units from their cores, at the same time when Intel was adding arithmetic units to their cores, instead of removing them.
For some years, and especially during the last six months before the Bulldozer launch in 2011, AMD has continuously made various claims about its performance, always stressing that it will crush Sandy Bridge, due to having a double number of "cores".
Immediately after launch, all the AMD claims were proved to be shameless lies. There is absolutely no excuse for AMD. Intel had published accurate information about the performance of Sandy Bridge, one year in advance. Even without the advance publication, the performance of Sandy Bridge could have been easily extrapolated from the evolution of Penryn, Nehalem and Westmere.
The performance of Bulldozer was determined by design decisions taken by AMD at least 3 to 4 years before its launch. Therefore, during the last 6 months, when they have lied the most, they knew perfectly well that they were lying and it should have been obvious to them that this is futile, because the lies will be exposed by independent benchmarks immediately after launch.
For most operations, a so-called Bulldozer "module" (with 2 "cores") had exactly the same execution resources as a Sandy Bridge core, while for a few operations, like integer multiplication, a Bulldozer "module" had even less execution resources than a Sandy Bridge core.
The common FPU of 2 Bulldozer "cores" was more or less equivalent with the FPU of a single Sandy Bridge core.
The Bulldozer integer "cores" were stripped down in comparison with the cores of the previous AMD CPUs, e.g. a new "core" had 2/3 of the addition throughput and 1/2 of the multiplication throughput of an old core.
So 2 Bulldozer integer "cores" were together barely faster than one old AMD core, while a single Sandy Bridge core was faster than them, by having an equal addition throughput, but a double multiplication throughput.
Bulldozer could have been a decent CPU, if only AMD, instead of describing it as a 4-"module"/8-"core" CPU, would have described it as a 4-core/8-thread CPU, which matched Sandy Bridge, except that in Sandy Bridge the 2 threads of a core shared dynamically most execution resources, while in Bulldozer only a part of the execution resources were shared dynamically, e.g. the FPU, while the rest of the execution resources were allocated statically to the 2 threads of a core, which is less efficient.
Such a description would have set correctly the expectations of the users, and they would not have felt cheated.
I am pretty certain that the huge disappointment that has accompanied the Bulldozer launch has hurt much more the AMD sales than they might have gained from the buyers lured by the false advertising about the 8-core monster AMD CPUs, which should easily beat the puny 4-core Intel CPUs.