OpenCL for Performance

Print Friendly, PDF & Email

OpenCLIn the past, a developer could write a fairly simple, sequential program that performed the required work, gave the correct answer, and worked on a variety of computing platforms. As long as the instruction set target was known, the application could be compiled and run on systems that contained the same instruction set and would perform in some type of relation to the speed of the processors. However, in today’s large scale HPC systems, different computing cores may be present, and certain architectural features will most likely change in the future. To future proof a program, a developer may choose to use a library which can insulate some of the particulars of the underlying complexity.

OpenCL is a library interface to various architectural choices. It can address various accelerators or specialized processors, including GPUs, FPGAs, CPUs, etc. A single program can be written to use the hardware at hand and do so efficiently without having to know the intricacies of different architectures. OpenCL has evolved rapidly from its introduction in 2008.

A typical environment would consist of the platform model which would include a host system and the devices where most of the computing will take place. It is important to understand the memory hierarchy when programming for various computing elements.

In a heterogeneous environment there are different memory segments available to the different processing elements. There is the host memory which is typically visible to only the host processors, global memory which is visible to both the host and the devices, and constant memory which is read-only memory for the devices. Within an OpenCL device there is also local memory, which is visible only to the processing elements.

When looking to optimize an existing application to a range of devices that OpenCL will operate on, there are a number of principals to consider:

  • Expose maximum parallelism – look at the algorithms
  • Apply known optimizations – for example, ensure memory access patterns.
  • Test across multiple devices – after all, this is what OpenCL is all about
  • Use the most extreme devices – always test and optimize on the Intel Xeon Phi and GPUs

OpenCL is a fairly new programming model that is designed to help programmers get the most out of a variety of processing elements in heterogeneous environments. Many benchmarks that are available have demonstrated that excellent performance can be obtained over a wide variety of devices. Rather than lock an application into one specific accelerator, by using OpenCL, applications can be run over on a number of different architectures.

Source: Intel, USA

Transform Your Code

Deliver top application performance and reliability with Intel Parallel Studio XE: A C++ and Fortran tool suite that simplifies the development, debug, and tuning of code. Compatible with leading compilers.