Intel has started talking about about Knights Ferry, the manycore development platform that precedes the Knights Corner chip. Both are part of the Many Integrated Core (MIC) architecture announced in May; Knights Ferry was briefed by Kirk Skaugen ISC’10.
PGI’s Michael Wolfe wrote an interesting overview of Knights Ferry for HPCwire late last week that compares the chip to Fermi and discussed the similarities to Larrabee, the little platform that couldn’t.
The Knights Ferry has 32 x86 cores on chip, each with 32KB L1 instruction cache, 32KB L1 data cache, and 256KB L2 cache. I will refer to them as 32 processors. Each processor has a vector unit, essentially a very wide (512 bits or 16 floats) SSE unit, allowing 16 single precision floating point operations in a single instruction. Double-precision compute throughput is half that of single-precision. The 32 data caches are kept coherent by a ring network, which is also the connection to the on-chip memory interface(s). Each processor supports a multithreading depth of four, enough to keep the processor busy while filling an L1 cache miss. The Knights Ferry is implemented on a PCI card, and has its own memory, connected to the host memory through PCI DMA operations. This interface may change in future editions, but Intel advertises the MIC as “an Intel Co-Processor Architecture.” This could be taken as acknowledgement that accelerators can play a legitimate role in the high performance market.