High performance systems now typically a host processor and a coprocessor. The role of the coprocessor is to provide the developer and the user the ability to significantly speed up simulations if the algorithm that is used can run with a high degree of parallelization and can take advantage of an SIMD architecture. The Intel Xeon Phi coprocessor is an example of a coprocessor that is used in many HPC systems today.
While main CPUs have been increasing the core counts in recent years, currently in mid-2016 topping out at 22 cores, the Intel Xeon Phi coprocessor contains over 60 cores, although somewhat less powerful. While both evolutionary and revolutionary, the Intel Xeon Phi coprocessor give developers the ability to write code as they would on a host CPU, but take advantage of the large number of cores and the SIMD architecture. It is important when developing new applications to think like the coprocessor is designed in order to be able to take advantage of the large numbers of cores.
A modern server contains both the host CPUs and the Intel Xeon Phi coprocessor which is attached to the PCI Express bus interface. The Intel Xeon Phi coprocessor comprises up to sixty-one cores that execute the Intel Architecture (IA) instructions. On the die is a bidirectional interconnect. Each of the cores consist of a 512-bit wide vector processing unit with an extended math unit. An important aspect of the coprocessor design is that data in memory be accessible in the fastest possible manner. The Intel Xeon Phi coprocessor utilizes GDDR5 memory. The coprocessor contains up to eight memory channels which support two GDDR5 memory channels each. This leads to very high performance and less waiting for data to be delivered to each core when needed.
It is important the software developers understand some aspects of hardware design in order to develop efficient and optimized applications.
Source: Intel, USA