Offloading Application Segments to Intel Xeon Phi Coprocessors

Offloading to a coprocessor does need to be considered carefully, due to the memory transfer requirements. When the data that is to be worked on resides in the memory of the main system, that data must be transferred to the coprocessor’s memory. The challenge arises because memory is not physically shared between the main system and the coprocessor.
“There are two offload models that the developer must consider when programming an application. The first is the non-shared memory model, and the second is the virtual shared memory model. Both of these models can be used in the same application.”

Prefetching Data for Intel Xeon Phi

“Prefetching on a coprocessor such as the Intel Xeon Phi coprocessor can be more important than on a main CPU such as the Intel Xeon CPUs. Since the cores on the Intel Xeon Phi coprocessor are in-order, they cannot hide memory latency as compared to an out-of-order CPU. In addition, since a coprocessor does not have an L3 cache, L2 misses must then access the slower memory subsystem.”