As new technologies emerge that speed up computations, it is important to understand what these new features are and the benefits that can be gained by using the new technology. Vectorization has been around in one form or another for many years, but with todays latest generation of CPUs and accelerators, this feature is now available for a wide range of applications.
The Intel Xeon Phi processor contains a new vector instruction set, the Advanced Vector eXtensions 512 bit (AVX-512). For applications to get a high level of performance from the latest systems, developers should investigate different approaches to vectorization and choose which is best for their circumstances. Different methods can be used, and can be mixed within an application. One important thing to note is that using vectorization requires an understanding of the data, as the data layout is critical to using this feature.
When considering vectorization, there are different tools that can assist the developer in determining where to look further. The first is to look at the optimization reports that are generated by the Intel compiler and then to also use the Vector Analyzer that can give specific advice on what to do to get more vectorization from the code.Tweet about AVX-512 instructionsClick To Tweet
The AVX-512 vector operations, along with other features designed into the Intel Xeon Phi processor, offer a wide range of support and many software options to help various types of applications to run significantly faster than before.
Since there are different methods that can be used to achieve faster performance, there is always the tradeoff between portability and the absolute highest performance. For math functions, using a library that can run on a variety of platforms makes it easier to move the application to different end user hardware environments transparently. The Intel Math Kernel Library (MKL) can assist developers by optimizing the running of many types of math functions and using vectorization when needed.
Vectorization leads to tremendous gains in performance. By being able to execute 8 double-precision mathematical operations simultaneously, performance is obviously increased for that part of the code which utilizes the AVX-512 instruction set. By knowing the data layout and where vectorization is possible, these performance gains can be realized.
Generally, there are three different approaches to vectorizing the code. The most obvious and easiest to use it the use of Libraries, which use the vector capabilities when possible. The second is auto-vectorization, or letting the compiler do its work. Along with some tips for the compiler, this method will allow the developer to help the compiler, yet the main work for determining what can be vectorized lies with the compiler smarts. The third method, which is the most time consuming, yet can yield the best (fastest execution results) is to add SIMD directives or pragmas to the code itself. This gives the developer the ability to mandate vectorization and assumes that the developer knows that certain regions are safe to vectorize. Within any of these methods are techniques and methodologies to follow which can result in excellent performance.
Overall, the most important concept is that the developer understand the algorithms and placement of data in order to get the best performance. With varying types of tools available, less effort with high portability can be achieved, or a lot of effort with little portability. Fortunately, Intel offers a wide range of options for developers in order to help to modernize various applications to take advantage of the latest hardware offerings.