When developing or modifying an application to run in a heterogeneous environment that consists of a main CPU (Intel Xeon) and a coprocessor (Intel Xeon Phi coprocessor), it is important to tune for both types of processors. In many cases, the single most important technique to get ready to program for the architecture of the Intel Xeon Phi coprocessor is to maximize the performance first on the Intel Xeon CPU.
Parallel software and parallel hardware, used together will give the best results for an application. If the application is serial in nature, and the processor is serial, then there will obviously not be a great gain in performance. When the application is parallelized, but the processor is serial, again, no great gain. A third combination is when the application is serial and the processing is parallel. Since the application cannot take advantage of the increased power of the hardware, there will not be a great performance boost. The best and really only solution is to modify the application to run in parallel, using high performing parallel hardware.
While both the Intel Xeon CPUs and the Intel Xeon Phi coprocessor can run parallel applications, the potential for major gains when the number of threads is high belongs to the Intel Xeon Phi coprocessor. If using the same number of threads on both, up to the limit of the Intel Xeon CPU, the Intel Xeon CPU will perform better. But since the Intel Xeon Phi has many more cores (although slower clock rates), an application that can scale into the hundreds of threads will see better performance.
There are a couple of investigations that should take place when beginning a project to utilized the high performance hardware in modern Intel based systems today:
- Scaling – is the application algorithm designed such that it can scale to over 100 threads ? A starting point for this discussion is whether the application can even scale at all for on Intel Xeon CPU.
- Vectorization and Memory Usage – is the application making use of the vector units and instructions ? and, can the application use more memory bandwidth than on the Intel Xeon CPU system.
If these are true for the application, then a highly optimized application can be written (or modified) and tuned to take advantage of the power of the Intel Xeon and Intel Xeon Phi coprocessor.
Source: Intel, USA
Transform data into opportunity. Speed data analysis in your applications.