Preparing Code For Parallel Execution

Print Friendly, PDF & Email

Sponsored Post

chipWith the advent of the tremendous compute density of new processors, it is important to understand if an application can take advantage of multicore. New systems that contain different computing elements may or may not be the best answer for certain algorithms or applications. Developers should understand if an application might be ready to run in a highly vectorized or many core environment before attempting to do the work necessary to obtain the high performance that might be expected.

An initial approach would be to run the application, which is assumed to be multithreaded at first, on an increasing number of cores. As the number of cores increases, the performance should as well. If the performance stops increasing or even decreasing as the number of cores increases, then there are areas for improvement to take advantage of the large number of cores that might be present.

Another simple test is to understand if vectorization is possible. By compiling an application with different compiler options and then comparing the performance, it can be quickly understood if vectorization helps the performance of the application. Using Intel auto-vectorization in one case, and then not using in the second case can give a quick idea of the possibilities for increasing performance. Algorithms may have to be modified to take advantage of vectorization, at least to the level of allowing the auto vectorization to work. Related to this, if the application relies on libraries for some of the heavy work, make sure to invoke libraries that have been tuned to use vectorization. This allows at least some of the application to run with vectorization, while the main line code may be limited.

Another readiness concept is to understand that if an application using MPI for example, that the communication overhead remains low compared to the CPU performance gain. It would be wasteful in terms of computing resources to spend too much time communicating between the processors and not giving the processors enough work to do.

Tap into the power of parallel on the fastest Intel® processors & coprocessors.  Intel® Parallel Studio XE