Offloading Application Segments to Intel Xeon Phi Coprocessors

Print Friendly, PDF & Email

phi2With today’s modern computer systems that contain both a CPU and an accelerator, there are a few methods that are available to take advantage of such a system. Offloading of parts of the application needs to happen to take advantage of the performance available of the coprocessor or accelerator.

Execution of the application will always begin on the host system, with its associated memory and storage system. When the execution gets to the part of the code that can be run on the coprocessor, the compiler will have inserted instructions to run this part of the application on the coprocessor. An interesting piece of information is that when using a coprocessor such as the Intel Xeon Phi coprocessor, the application will run, whether the coprocessor is present or not. If a developer creates code that cannot run on the coprocessor, the application will continue to run with those instructions executed on the host system. This is important, as application portability is maintained, regardless of exactly what is installed in the system.

“Offloading to a coprocessor does need to be considered carefully, due to the memory transfer requirements. When the data that is to be worked on resides in the memory of the main system, that data must be transferred to the coprocessor’s memory. The challenge arises because memory is not physically shared between the main system and the coprocessor.”

There are two offload models that the developer must consider when programming an application. The first is the non-shared memory model, and the second is the virtual shared memory model. Both of these models can be used in the same application.

The non-shared model can use the offload pragma or directives.  This model is good for dealing with flat data structures, such as scalar data, arrays and can be copied to the coprocessor memory without pointers involved. The shared virtual memory model can use Cilk keywords, and allow for variables to be used for both host and coprocessor code. Coherence is maintained from the beginning to the end of the offload statements.

It is important to understand the various memory models when developing applications that use both the processor and coprocessor within an application.

Source:  Intel, USA

Transform Data into Opportunity Accelerate analysis: Intel® Data Analytics Acceleration Library.