The benefits of nested parallelism on highly threaded applications can be determined and quantified. With the number of cores in both the host CPU (Intel Xeon) and the coprocessor (Intel Xeon Phi) continues to increase, much thought must be given to minimizing the thread overhead when many threads need to be synchronized, as well as the memory access for each processor (core). Tasks that can be spread across an entire system to exploit the algorithm’s parallelism, should be mapped to the NUMA node to make them more efficient.
“Applications can be tuned to use both the Intel Xeon and the Intel Xeon Phi simultaneously, without modifying the code to just run on the coprocessor. Using a number of software tools from Intel, performance of a coupled cluster method can be demonstrated to gain a tremendous performance with excellent scaling.”
At the Centre for High-Performance Computing (CHPC) in South Africa, the mission is to enable cutting-edge research by supporting the highest levels of HPC available. That means ensuring researchers – who often are not experienced with computers let alone HPC systems – to get their work done with the HPC getting in the way.
“As the use of coprocessors increases to speedup HPC applications, it is important to understand how much additional power the coprocessors use. With various measurements and benchmarks arising to calculate the power used during the running of compute and data intensive applications, measuring the power draw from an Intel Xeon Phi coprocessor is important to understanding the best use of resources.”