Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> and the case was decided against AMD because of the single FPU per core module:

this has always been a massive oversimplification and misrepresentation from the AMD fanclub. the case was decided because of shared resources including the L2 and the frontend (fetch-and-decode, etc) which very legitimately do impact performance in a way that "independent cores" do not.

Citing the actual judgement:

> Plaintiffs allege that the Bulldozer CPUs, advertised ashaving eight cores, actually contain eight “sub-processors” which share resources, such as L2 memory caches and floating point units (“FPUs”). Id. ¶ 37–49. Plaintiffs allege that the sharing of resources in the Bulldozer CPUs results in bottlenecks during data processing, inhibiting the chips from “simultaneously multitask[ing].” Id. ¶¶ 38, 41. Plaintiffs allege that, because resources are shared between two “cores,” the Bulldozer CPUs functionally only have four cores. Id. ¶ 38–43. Therefore, Plaintiffs claim the products they purchased are inferior to the products as represented by the Defendant. Id. ¶ 39

This is completely correct: the chip only has one frontend per module which has to alternate between servicing the two "cores" and this does bottleneck their independent operation. It is, for example, not the same thing as a "core" used on Phenom and this significantly impacts performance to a hugely negative extent when the "second thread" is used on a given module.

It's fine to do it, this same approach is used in SPARC for example, but SPARC doesn't market that as (eg) a "64 core processor", they market it as "8 cores, 64 threads". But AMD wanted to have the marketing bang of having "twice the cores" as intel (and you can see people representing that even in this thread, that programmers bought AMD for compiling because "it had twice the cores", etc). And that is not how anyone else has marketed CMT before or since, because it's really not a fair representation of what the hardware is doing.

Alternately... if that's a core then Intel is definitely a core too, because basically CMT is just SMT with some resources pinned to specific threads, if you want to look at it like that. After all where is the definition that says an integer unit alone is what constitutes a core? Isn't it enough that there are two independent execution contexts which can execute simultaneously, and isn't it a good thing that one of them can use as many resources as possible rather than bottlenecking because of an execution unit "pinned" to another thread? If you accept the very expansive AMD definition of "core" then there's lots of weird stuff that shakes out too, and I think consumers would have found it very deceptive if Intel had done that like AMD, that’s obviously not what a “core” is, but it is if you believe AMD.

AMD did a sketch thing and got caught, end of story. No reason they should call it anything different than the other companies who implement CMT.

I hate hate hate the "AMD got skewered because they didn't have an FPU" narrative. No, it was way more than the FPU, and plaintiffs said as much, and it's completely deceptive and misrepresentative to pretend that's the actual basis for the case. That's the fanclub downplaying and minimizing again, like they do everytime AMD pulls a sketchy move (like any company, there have been a fair few over the years). And that certainly can include El Reg too. Everyone loves an underdog story.

Companies getting their wrists slapped when they do sketchy shit is how the legal system prevents it from happening again and downplaying it as a fake suit over crap is bad for consumers as a whole and should not be done even for the “underdog”. The goal shouldn’t be to stay just on the right side of a grey area, it should be to market honestly and fairly… like the other companies that use CMT did. Simple as.

To wit: NVIDIA had to pay out on the 3.5GB lawsuit even though their cards really does have 4GB. Why? Because it affected performance and the expectation isn't mere technical correctness, it's that you stay well on the right side of the line with your marketing's honesty. It was sketch and they got their wrist slapped. As did AMD.



The UltraSparc T1 shared the one FPU and the one logical L2 between all 32 threads/8 cores. L2 is very common to share between cores, the world has more or less converged on a shared L2 per core complex, so 4 to 8 cores. And you still see vector units shared between cores where it makes sense too, for instance Apple's AMX unit is shared between all of it's cores.

It's really only the frontend and it's data path to L1 that's a good argument here, but that's not actually listed in the complaint.

And even then, I can see where AMD was going. The main point of SMT is to share backend resources that would otherwise be unused on a given cycle, but these have dedicated execution units so it really is a different beast.


> And even then, I can see where AMD was going. The main point of SMT is to share backend resources that would otherwise be unused on a given cycle, but these have dedicated execution units so it really is a different beast.

Sure, but wouldn't it be ideal that if a thread wasn't using its integer unit and the other thread had code that could run on it, you'd allow the other thread to run?

"CMT" is literally just "SMT with dedicated resources" and that's a suboptimal choice because it impairs per-thread performance in situations where there's not anything to run on that unit. Sharing is better.

If the scheduler is insufficiently fair, that's a problem that can be solved. Guarantee that if there is enough work, that each thread gets one of the integer units, or guarantee a maximum latency of execution. But preventing a thread from using an integer unit that's available is just wasted cycles, and that's what CMT does.

Again: CMT is not that different from SMT. It's SMT where resources are fixed to certain threads, and that's suboptimal from a scheduling perspective. And if you think that's enough to be called a "core", well, Intel has been making 8-core chips for a while then. Just 2 cores per module ;)

Consumers would not agree that's a core. And pinning some resources to a particular thread (while sharing most of the rest of the datapath) does not change that, actually it makes it worse.

> It's really only the frontend and it's data path to L1 that's a good argument here, but that's not actually listed in the complaint.

That's just a summary ;) El Reg themselves discussed the shared datapath when the suit was greenlit.

https://www.theregister.com/2019/01/22/judge_green_lights_am...

And you can note the "such as" in the summary, even. That is an expansive term, meaning "including but not limited to".

If you feel that was not addressed in the lawsuit and it was incorrectly settled... please cite.

Again: it's pretty simple, stay far far clear of deceptive marketing and it won't be a problem. Just like NVIDIA got slapped for "3.5GB" even though their cards did actually have 4GB.

With AMD, "cores" that have to alternate their datapath on every other cycle are pretty damn bottlenecked and that's not what consumers generally think of as "independent cores".


> Sure, but wouldn't it be ideal that if a thread wasn't using its integer unit and the other thread had code that could run on it, you'd allow the other thread to run?

> "CMT" is literally just "SMT with dedicated resources" and that's a suboptimal choice because it impairs per-thread performance in situations where there's not anything to run on that unit. Sharing is better.

> If the scheduler is insufficiently fair, that's a problem that can be solved. Guarantee that if there is enough work, that each thread gets one of the integer units, or guarantee a maximum latency of execution. But preventing a thread from using an integer unit that's available is just wasted cycles, and that's what CMT does.

Essentially, no, what you're suggesting is a really poor choice for the gate count and numbers of execution units in a Jaguar. The most expensive parts are the ROBs and their associated bypass networks between the execution units. Doubling that combinatorial complexity would probably lead to a much larger, hotter single core that wouldn't clock nearly as fast (or have so many pipeline stages that branches are way more expensive (aka the netburst model)).

> And you can note the "such as" in the summary, even. That is an expansive term, meaning "including but not limited to".

Well, except that I argue it doesn't include those at all; shared L2 is extremely common, and shared FPU is common enough that people don't really bat an eye at it.

> If you feel that was not addressed in the lawsuit and it was incorrectly settled... please cite.

I'm going off your own citation. If you feel that after that these were brought up in the court case itself you're more than welcome to cite another example (ideally not a literal tabloid, but keeping the standards of the court documents you cited before).

> With AMD, "cores" that have to alternate their datapath on every other cycle are pretty damn bottlenecked and that's not what consumers generally think of as "independent cores".

That's not how these work. OoO Cores are rarely cranking away their frontends at full tilt, instead they tend to work in batches filling up a ROB with work that will then be executed as memory dependencies are resolved. The modern solution to taking advantage of that is to aggressively downclock the front end when not being used to save power, but I can see the idea of instead keeping it clocked with the rest of the logic and simply sharing it between two backends as a valid option.


But even this article states that the L2 and front end aren't bottlenecks on simultaneous operation.

Perhaps it'd be more accurate to say that the case was lost primarily based on the strength of the argument that there are four FPUs, considering how there had been other examples of independent cores sharing L2 and other things.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: