Both the Intel Xeon processor and the Intel Xeon Phi coprocessor continue to increase in performance as each generation is developed. To gain maximum performance from these architectures, it is important to utilize compiler directives in order to gain top performance based on each architecture. Compilers tend to be safe when determining whether a loop can be vectorized. With human intervention, more aggressive SIMD (Single Instruction Multiple Data) architectures can be utilized.
Vector instruction sets have progressed over time, and it important to use the most appropriate vector instruction set when running on specific hardware. The OpenMP SIMD directive allows the developer to explicitly tell the compiler to vectorize a loop. In this case, human intervention will override the compilers sense of dependencies, but that is OK if the developer knows their application well.
Many Intel Xeon processors support the AVX (Advanced Vector Extensions) instructions and have a single 256 bit vector unit for each core. Newer processors may contain a 512 bit vector unit per core. Vectorization of applications over time, using both more threads and as well as longer vector units show the tremendous performance gains as processor technology has progressed.
Loop vectorization using pragmas in C, C++ or FORTRAN code help the compiler to generate vector instructions. Since a developer knows their code well, the pagma, omp simd , can greatly enhance the vectorization of an application.
Targeting a specific architecture with OpenMP directives ensures that both loops are vectorized properly and that there is efficient use of the vector instructions.
Larger vector units continue to be developed as well as more capable vector instruction sets. While the hardware capability continues to increase, compilers also have progressed in order to generate the most efficient code for a given architecture. With the addition of the OpenMP SIMD directive, developers have control over the application in a number of ways. This can lead to tremendous gains in performance, beyond just hardware speedups. Performance gains over a number of tests involving different programming languages and various modern platforms show a consistent improvement of 4X to 5X .
Source: Intel, USA
Transform data into opportunity. Speed data analysis in your applications.