This week at the International Solid State Circuits Conference (ISSCC) all of the chip community are gathered together to talk about the latest and greatest research and technology breakthroughs, and not share anything that might dull their competitive advantage. Chip companies also tend to tie their product announcements in with ISSCC, which explains the timing of the Itanium, POWER, and other announcements over the past several days.
Late last week I talked with AMD about Llano, their forthcoming laptop platform. That’s right, a laptop chip. So why “waste” your time when this is an HPC blog? Well, the concept is interesting, and I bet the technology that they are jamming into that chip finds its way upstream before too long.
AMD’s marketing angle these days is that we are entering the heterogenous systems era (following the single and multi-core eras), enabled by data parallelism and GPUs. You see this reflected in AMD’s materials when they talk about their “Fusion” strategy. Llano is AMD’s GPU+CPU processor — they call it an APU, or Accelerated Processing Unit. Llano puts 4 x86 (Phenom II) cores plus one GPU (they are mum on the number of cores the GPU itself will have) onto a single die in a 32nm process. The processor supports DDR3 memory (though they aren’t talking how many channels), is DirectX 11 capable, and will be available for sampling in the first half of this year. It is expected to operate at greater than 3 GHz.
Intel is talking about something similar with its Westmere solution, but it isn’t as integrated as Llano. Westmere puts a CPU and a separate GPU into a single package with a dedicated bus, where Llano bakes them together onto the same silicon.
All we know about the GPU itself is that it is derived from the Radeon HD5000 series. According to Sam Naffziger, a Senior Fellow at AMD, Llano will be capable of handling actual graphics in the laptops it is sold in, in addition to supporting computational tasks. The GPU doesn’t use HyperTransport to link to the CPU; it uses something else within the die that also isn’t being discussed yet.
The initial rollout is planned for the laptop market, followed by desktops. They are looking at a Bulldozer plus GPU Fusion solution for server (and HPC) customers, but no word on any of that yet, and AMD is still pushing the discrete GPU model as the way to go today for HPC users. I hate having to hit the PCI bus to get work out there and bring the answers back, though, and separately managing GPU memory explicitly is not the most programmer-friendly approach I can think of for getting work done. I’d love to see an integrated CPU/GPU out there for high end computing, and I’ll bet someone tries to build a cluster out of these Llano parts.
None of this is what AMD is talking up at ISSCC, though. At the conference they are talking about some of the innovations they are baking into the chips to make them especially power (and therefore laptop, but why not datacenter as well) friendly.
Power management and measurement for cores
First, AMD has added power gating at the core level on the die. Today the OS can use the ACPI spec to run AMD chips down to a lower power state, but you have to power down all the cores and the low power state isn’t all that low. In Llano each individual core can be shut down, and the power reduction is far more dramatic (10 times less power leakage, according to Naffziger). The solution is based in hardware, and uses counters to watch for an idle core. Cores can be turned off an on at a resolution of tens of microseconds — fast enough to not interfere with the coarser-grained operation of ACPI up at the OS level.
Llano also puts a digital power meter on the chip for each core, providing microsecond-level power information that AMD claims is much more accurate than the more conventional analog ways of measuring power. This information will be used when a core is running to help the hardware and the OS optimize power distribution and use. AMD isn’t talking about details here yet, but it sounds useful.
Something I had never thought about as optimizable in terms of chip power consumption is the clock. It turns out that a fully populated clock grid uses about 20-30% of the dynamic power in a chip. AMD depopulated the clock grid in Llano to only put the signal in the specific places it has to be. This seemingly obvious improvement reduces the clock’s power consumption by about half, according to Naffziger. Because the clock is always running, this helps reduce Llano’s power consumption across the board — a good thing in a laptop. And a good thing in a server, come to think of it.