Application Performance & Power Consumption on Intel Xeon Phi

Print Friendly, PDF & Email

powerWhile all computer systems require electrical power, those used in highly scalable HPC environments will consume in the range of 10 – 20 MW. Even as the performance per power measurement has shown tremendous improvements over the years, the cost of powering such large systems is substantial. It is estimated that an Exascale system will require several gigawatts of power, using 2014-2015 technologies. Many RFP’s today require that vendors specify the operating metrics of large systems.

While new technology will be developed that reduces the power per operation needed, in today’s environments it is important to understand how an application affects power usage. For modern applications that have been optimized to take advantage of both the Intel Xeon CPU and the Intel Xeon Phi coprocessor, the hardware mentioned does include various power states, which can minimize the power consumption when idle.

Another interesting feature that should be investigated is how processor affinity for software threads affects the power draw. Various techniques can be used in an OpenMP environment to control where the threads are placed. Basically, keep the threads close to one another, or scattering the threads over as many cores as possible. Thread mapping can affect performance, as well as how much power is used. Tests can show, using some standard benchmarks, that on the Intel Xeon Phi coprocessor, that the best Mflops per watt occur using the scattering function. As more threads are used (from 1 to 229), the performance will increase, but so will the power use. So a good measure of the efficiency of the system which is using the Intel Xeon Phi coprocessor is to look at how much performance is obtain per amount of electricity used. The “scatter” implementation consistently beats the keeping of threads together, for this measurement.

In addition to running a sample application on the Intel Xeon Phi coprocessor, running on the Intel Xeon CPU will show similar results. The performance per watt, as the threads increase is higher for instances when the threads are free to be associated with any core.

Additional investigations can be performed to look at larger usage power issues across and entire data center.

Source:  Intel, USA;  PNNL, USA

Transform data into opportunity. Speed data analysis in your applications.

Modernize your code with Intel® Parallel Studio XE