Innovation in the high performance computing world requires that new technology be created while maintaining compatibility with older applications. The Intel Xeon Phi processor is an example of creating a new class of performance, while maintaining the ability to run older applications, through a standard and well know instruction set.
Designing a new generation of hardware with such high performance needs to make sure that developers understand the basics, and are familiar with the architecture of a new system. Single thread performance with the Intel Xeon Phi processor is significantly better than previous designs. In addition, in order to speed up performance even more, vector processing, where applicable is critical in application performance. With two vector processing units (VPUs) per core, applications can execute two 512-bit vector multiply-add instructions per cycle. Each of these cores can deliver 32 double precision operations per clock cycle. The VPU executes all of the floating point operations as well as legacy instructions from SSE to AVX to the new AVX-512 instructions.
The basic core in this system is a 2-wide, out-of-order core that supports up to four threads at the same time. This can have a dramatic effect on even legacy applications. To keep the cores working, there is also 1 MB of L2 cache per core. The caches are coherent across all of the cores, such that a write operation to cache line will invalidate all other copies of that line.
Each tile in the Intel Xeon Phi processor consists of two cores, 2 VPUs, and the L2 cache. When operating together with all of the other tiles as well as MCDRAM and DDR memory, an extremely high performance system is created. By keeping the base familiarity to developers, applications can be modified as needed, yet older applications will still run as expected.