Employing a hybrid of MPI across nodes in a cluster, multithreading with OpenMP* on each node, and vectorization of loops within each thread results in multiple performance gains. In fact, most application codes will run slower on the latest supercomputers if they run purely sequentially. This means that adding multithreading and vectorization to applications is now essential for running efficiently on the latest architectures.
Let The Compiler Do Its Thing
“In the past, developers would get best results if a loop was unrolled, that is, duplicating the body as many times as needed to that the operations could be operated on using full vectors. The number of iterations would reflect the hardware that the code was targeted towards. Since the application may have to run on different hardware in the future, results for todays generation of hardware may be compromised in the future. In fact, it is better to let modern compilers to the unrolling.”
Vectorization Leads to Performance Gains
Applications that can take advantage of the new vectorization capabilities of the Intel Xeon Phi processor will show tremendous performance gains. “When considering vectorization, there are different tools that can assist the developer in determining where to look further. The first is to look at the optimization reports that are generated by the Intel compiler and then to also use the Vector Analyzer that can give specific advice on what to do to get more vectorization from the code.”
Modernizing Code with the Intel Vectorization Advisor
Threading plus vectorization together can increase the performance of an application more than one technique or the other. Threading and vectorizing an application are two techniques that are known to increase the performance of an application using modern CPUs and coprocessors. However, a deep understanding of the application is needed in order to make the decisions needed and to rewrite portions of the application to take advantage of these techniques. In cases where the developer might not be familiar with the code an automated tools such as the Intel Vectorization Advisor can assist the developer.