Monitoring Power Consumption with the Intelligent Platform Management Interface

image-2306While performance is still king in the HPC world, increasing attention is being paid to the power consumption for a given set of computations. New benchmarks are being created that measure more than absolute (Tflop) performance, but can come up with a performance/watt value at the end of the run.

Previously, some methods to ensuring that the best performance per power unit was achieved at the system level were discussed. However, since large HPC installations consists of a large cluster of individual servers, a method needs to be used to visualize the power used for the cluster over the time that the actual application is running.  Viewing the data over an entire cluster can help users and administrators to detect inefficiencies that lead to more power being used than expected.

NWPerf is software that can measure and collect a wide range of performance data about an application or set of applications that run on a cluster. With minimal impact on performance, NWPerf can gather historical information that then can be used in  a visualization package. The data collected includes the power consumption using the Intelligent Platform Management Interface (IPMI) for the Intel Xeon processor and the libmicmgmt API for the Intel Xeon Phi coprocessor. Once the data is collected, and using some data extraction mechanisms, it is possible to examine the power used across the cluster, while the application is running.

Once the data is collected, an application such as CView can be used to create waterfall charts. 3D charts can be created, where each axis shows time, nodes, or color coded metrics. For example, the power can be the metric measured over time, or the CPU usage is low due to memory bandwidth.  By combining the waterfall charts across a cluster, important information can be extracted, which can lead to a better understanding of the usage of the cluster.

Measuring various parameters while an application is running is very important to understanding the performance of an HPC application. This important information should not be ignored.

Source:  Intel, USA;  PNNL, USA

