Application developers have a lot to consider when designing and implementing High Performance Computing (HPC) applications. A wide range of decisions must be made as to how to use both main CPUs and coprocessors, such as the Intel Xeon Phi coprocessor, effectively and efficiently. One of the main decisions can be how to distribute the work that needs to be done between the CPU and the coprocessor.
A number of factors will influence this decision. The main concern is how much of the application can be parallelized. If large sections of the application can be offloaded to a coprocessor, then that will be a major decision point. Offloading parts of the application to the coprocessor is most appropriate when the application cannot be made highly parallel throughout most of the time that the application is running.
“When an application is IO intensive, the host is best a running those parts of the code. If a large application is being executed that has a small number of hotspots, then offloading to the coprocessor is best. However, it is important to note that the time and expense of transferring the data must outweigh the performance gains of using the coprocessor.”
Native execution is good for application that are performing operations that map to parallelism either in threads or vectors. However, running natively on the coprocessor is not ideal when the application must do a lot of I/O or runs large parts of the application in a serial mode. Offloading has its own issues. Asynchronous allocation, copies, and the deallocation of data can be performed but it complex. Another challenge with offloading is that it requires memory blocking.
Overall, it is important to understand the application, the workflow within the application and how to use the Intel Xeon Phi coprocessor most effectively.
Source: Intel, USA