I genuinely enjoy Greg Pfister’s writing; his is one of the blogs I’d recommend you follow if you are in HPC. His latest post attempts to reason out some of the problems that may have put the kibosh on the commercial debut of Larrabee, and is based on a talk that Tom Forsyth, one of the Larrabee architects, gave earlier this month at Stanford.
But, among many other interesting things, he also said the target memory bandwidth – presumably required for adequate performance on the software renderer being written (and rewritten, and rewritten…) was to be able to read 2 bytes / thread / cycle.
…The thing is: 2 bytes / cycle / thread is a lot. It’s so bit that a mere whiff of it would get HPC people, die-hard old-school HPC people, salivating. Let’s do some math:
Let’s say there are 100 processors (high end of numbers I’ve heard). 4 threads / processor. 2 GHz (he said the clock was measured in GHz).
That’s 100 cores x 4 treads x 2 GHz x 2 bytes = 1600 GB/s.
Greg puts this in perspective by pointing out that you could move a 1.5 TB drive every second at that rate, and that its more than 100 times the bandwidth of QuickPath.
In other words, it’s impossible. Today. It might be that Intel is pausing Larrabee to wait for product shipment of some futuristic memory technology, like the 3D stacked chips with direct vias (vertical wires) passing all the way through the RAM chip to the processor stacked on it (Exascale Ambitions talk at Salishan 20 by Bill Camp, p. 21).
Greg also talks about difficulties with the software model
Tom also said that the good news is that they can do 20 different rendering pipelines, all on the same hardware, since it’s a software renderer; and the bad news is that they have to. He spoke of shipping a new renderer optimized to a new hot game six months after the game shipped.
The full post is a worthwhile read, I recommend it.