Following up on my pointer to AnandTech’s article on AMD’s position in the HPC/server market from yesterday is this pointer to an article from them this week on Bulldozer, the next server architecture from AMD.
Bulldozer moves to heterogeneous cores on the die: each chip includes two integer cores, each with its own instruction scheduler and L1 instruction and data caches, and (only) one floating point core.
Intel’s Core architecture uses a unified scheduler fielding all instructions, whether integer or floating point. AMD’s architecture uses independent integer and floating point schedulers. While Bulldozer doubles up on the integer schedulers, there’s only a single floating point scheduler in the design.
Behind the FP scheduler are two 128-bit wide FMACs. AMD says that each thread dispatched to the core can take one of the 128-bit FMACs or, if one thread is purely integer, the other can use all of the FP execution resources to itself.
AMD believes that 80%+ of all normal server workloads are purely integer operations.
We’ll have to wait and see what all is encoded in that statement before we know how important the chip will be for HPC in general. Is HPC a “normal server workload?” We typically think of traditional scientific and engineering calculations as unbalanced in favor of floating point, but this may not always be the case depending upon how much indexing and data structure traversal you do in your application.
What is going to be helpful to know is how AMD is counting cores
Henceforth AMD is referring to the number of integer cores on a processor when it counts cores. So a quad-core Zambezi is made up of four integer cores, or two Bulldozer modules. An eight-core would be four Bulldozer modules.
It’s a distinct shift from AMD’s (and Intel’s) current method of counting cores. A quad-core Phenom II X4 is literally four Phenom II cores on a single die, if you disabled three you would be left with a single core Phenom II. The same can’t be said about a quad-core Bulldozer. The smallest functional block there is a module, which is two cores according to AMD.
Its also a pain to keep track of — “are those Intel cores or AMD cores?”
Part of the motivation for this approach is AMD’s belief that the GPU (or, more generally, accelerators) are going to be the preferred approach for doing the non-integer work in an application
AMD did add that eventually, in a matter of 3 – 5 years, most floating point workloads would be moved off of the CPU and onto the GPU. At that point you could even argue against including any sort of FP logic on the “CPU” at all. It’s clear that AMD’s design direction with Bulldozer is to prepare for that future.
I’m not sure I buy that, at least not in a world of PCI-e connected accelerators. Today I think it’s more likely that the GPU will move onto the CPU, creating even more core heterogeneity on the die, not less. But as I’ve said many times in this space before, I don’t run a multi-billion dollar chip company.