In this special guest feature, Siddhartha Jana provides an update on cross-community efforts to improve energy efficiency in the software stack.

Siddhartha Jana is an HPC Research Scientist and a member of the Energy Efficient HPC Working Group, which now has more than 700 members in the Americas, Europe, Asia, Africa, and Australia.
This article covers events at SC17 that focused on energy efficiency and highlights ongoing collaborations across the community to develop advanced software technologies for system energy and power management.
Over the past two decades, the computing industry has undergone a transition from the terascale era (compute throughput = 10^12 Flop/s) to the petascale era (compute throughput = 10^15 Flop/s). This transition has been characterized by significant changes in the hardware. The HPC industry appears to be reaching a crossover point where hardware innovations must be accompanied by significant changes in the software stack in order to reach the exascale era where compute throughput will exceed 10^18 Flop/s. The HPC community has established a bold goal of delivering such Exascale machines in 2020-2022. Exascale initiatives have been announced worldwide: the European Commission’s Exascale Project, the US DOE’s Exascale Computing Project, Japan’s Flagship 2020 project, and China’s Exascale project.
1. The need to invest in energy and power management in the system stack
One of the primary challenges for exascale is the total cost and variability associated with the power consumed, by not only the HPC system but also the additional infrastructure supporting it. The engineering efforts that are being invested in designing an exascale machine are hindered by the tight power constraint of 20-30 MW. The motivation for controlling this cost differs drastically from one site to another. For some sites, external factors such as shortage of electricity, natural disasters (e.g. tsunamis and earthquakes), and government-issued mandates limit the supply of power. For others, limitations in the design of the facility that houses the system can lead to power shortages. Some sites are driving research efforts in this field with the goal of mitigating the impact of computation on the environment. Another motivation is to reduce electricity bill costs in order to improve future purchasing power for computing resources. Even sites that are not facing an immediate shortage of power are driven by an active interest to stay “ahead of the curve,” to improve system reliability and resiliency. These are especially of concern – given the current trends – the variability in power draw in future systems (i.e. the delta between complete idle state to peak Linpack power during a full-system run) is being projected to be as high as 40 MW. In addition, any fault in software or hardware may lead to premature termination of applications, leading to additional power costs with zero gains.

Figure 1. Top 500 trends in performance and energy efficiency
[Data Source: 2005-2017: Top500.org]
2. Why turn to software for power savings?
The past decade has seen the end of two major computing laws. The first of these, Dennard Scaling, suggested an exponential rise in performance per watt of microprocessors with silicon transistors. Its breakdown was brought about by an increase in power leakage within circuits. This impacted the ability of CPUs to boost performance by increasing the operating frequency. This led to a change in the design of the hardware architecture: computational parallelism was introduced. Almost a decade has passed since then; as of this writing, the second law is facing a similar fate, namely Moore’s law, which predicted a biennial exponential rise in growth of transistor count within a silicon chip. Two main factors that have contributed to the slowdown of this law are the limits imposed by the laws of physics and the high power-density of the silicon transistors. Figure 2 depicts this behavior by showing the exponential rise in transistor counts, and the flattening of the curves corresponding to the clock speed, power, and performance per clock cycle. As a result, power consumption has become one of the most significant concerns when designing future exascale systems. Since improvements in hardware design are expected to dictate the upper bounds of the system efficiency, there arises a need for the software stack to adapt itself to leverage hardware features in order to meet these bounds.

Figure 2. Design trends in transistor count, performance, frequency, power, and the number of cores (1970-2015). [Data Source: 1970 to 2010 raw data, Data Source: 2011 to 2017 raw data, Graph Plot Source: weblink]
While increasing system efficiency, one question that arises is, in which component within the software ecosystem should one introduce the power monitoring and control schemes. The answer, in fact, may not be limited to a single layer within the software stack. Figure 2.1 presents an overview of the software ecosystem that includes components that are not simply limited to the HPC system; it also includes the secondary infrastructure that supports the facility that’s housing the machine.
Current efforts focus on the HPC system by itself, mainly resource management and static runtime control (node power capping). What is needed to address the exascale challenge is a data center software stack that can react dynamically (system and application runtime control) and statically (resource management and scheduling) to changing power and energy constraints. This may also include data center overhead predictions combining extrapolated data (machine learning) with changing external factors, such as weather and power grid excess or limitations.

Figure 2.1. Participation of multiple software components that expose opportunities for site-wide power management.
3. Power management efforts in the software stack: Updates from SC17
The SC17 conference included multiple technical tracks on active cross-community collaborations that are driving energy and power management from within the software stack. Many of these technical tracks were organized by the Energy Efficient HPC Working Group and the material for these sessions are available here.
3.1. Interaction among layers of the system software stack
3.1.1. The Exascale Computing Project (ECP):
James Ang from the Center for Computing Research (Sandia National Laboratories, USA) and manager of the DOE Exascale Computing Program, gave an overview of the scope of the project and the challenges it aims to tackle during the EE HPC WG Workshop. The goal of this program is to develop an HPC ecosystem using “a co-design approach to deliver new software, applications, platforms, and computational science capabilities”. During this talk, the logical layout of the ECP software stack was presented and the need for a system software stack that supports operation within a power envelope of 20-30W was highlighted (Slides).
This talk outlined three active R&D efforts that target different layers of the stack. These projects are part of ongoing collaboration between the US DOE Laboratories, academia, and vendors. These are briefly discussed below:
(a) Runtime System for Application-Level Power Steering (Lawrence Livermore National Laboratory): This work focuses on “safe execution and performance optimization” of applications running in a power-limited environment. The aim is to enhance the widespread use of Intel’s Global Extensible Open Power Manager (GEOPM) in order to optimize the performance of ECP-enabled applications with and without energy and power constraints while ensuring safe execution (Refer to Figure 3).

Figure 3. Power management in ECP software stack: Runtime System for Application-Level Power Steering
(b) Exascale Performance API (UT-Knoxville): This work focuses on designing a “consistent interface and methodology” for monitoring hardware and software-based performance events. The goal here is to enhance the usage of Exa-PAPI in order to monitor hardware performance counters, exposed software-defined events, and monitor and control power metrics (Refer to Figure 4).

Figure 4. Power management in ECP software stack: Exa-PAPI – The Exascale Performance API
(c) Operating System and Resource Management for Exascale (Argonne National Laboratory): This work focuses on “improving and augmenting” the Argo Operating System and associated resource management frameworks. The goal here is to deliver and enhance high-quality software mechanisms and policies that leverage the capability of ECP systems, including Intel’s GEOPM (Refer to Figure 5).

Figure 5: Power management in ECP software stack: Argo – Operating System and Resource Management for Exascale
3.1.1.1. Notes on the GEOPM framework
The role of GEOPM (Global Extensible Open Power Manager) was mentioned in reference to two of the ECP projects above: “Runtime System for Application Level Power Steering” and “Operating System and Resource Management for Exascale”. The goal of this project is to provide the community with an open, robust, flexible, and scalable platform for research on advanced application-aware energy management strategies. GEOPM is a runtime that runs asynchronously on compute nodes, monitors the application’s behavior through lightweight profiling, then leverages learning and control system techniques to discover runtime patterns in the application and tune hardware knobs in the underlying hardware platform to exploit those patterns (Figure 6). It is open source software (BSD 3-clause license) and supports extensible tuning strategies through plugins. Examples of pre-bundled plugins include one that improves application time-to-solution subject to a bound on job power and another that improves energy-to-solution subject to a bound on impact to time-to-solution. The EE HPC WG workshop at SC’17 included sessions where experiences using GEOPM at sites such as STFC/Hartree (slides), LNL (slides), and LRZ (slides) were discussed. Ongoing collaboration with Argonne, Sandia, and CINECA were also announced.

Figure 6: GEOPM Interfaces and HPC stack integration (Slides)
3.1.2. Synergy and Compatibility between Open Interfaces and Frameworks for Power Management:
Another event organized by the Energy Efficient HPC Working Group at SC17 was the Birds of a Feather (BoF) session that focused on updating the community on the current status on open frameworks for monitoring and controlling power consumption. This has been a timely effort given the rise in power management knobs being introduced by the vendors, with every passing generation of hardware platforms. The session focused on three such efforts:
- PowerAPI: The PowerAPI is a portable system software API that aims to facilitate power management-based communication across different layers of the system software stack. The API encompasses power management at various levels of granularity ranging from facility level concerns down to low-level hardware and software interfaces. During the BoF session, updates on Cray’s implementation and the reference implementation were presented (Slides).
- GEOPM: The Global Extensible Open Power Manager is a runtime for in-band power management and optimization. This open source software (BSD 3-clause license) has an extensible plugin-based architecture that enables application awareness while striving for job-wide global optimization in performance and energy/power consumption (Slides).
- Redfish: This is an open industry standard specification and schema that enables monitoring and control of IT infrastructure. It provides a RESTful (Representational State Transfer) API that enables both in-band and out-of-band monitoring of system components that comply with the specification. During the BoF session, a Redfish-based agent was discussed that facilitates users of Power API implementations to access and control energy- and power-based metrics of the hardware platform (Slides).
3.2. First worldwide survey on the active energy- and power-aware job scheduling and resource management solutions within HPC sites.
Another noteworthy Birds-of-a-feather session, attended by representatives from multiple HPC sites, was titled “State of the Practice: Energy and Power-Aware Job Scheduling and Resource Management (EPA-JSRM)”. This interactive BoF session discussed results from the first-of-its-kind global survey of HPC sites that are actively investing in EPA-JSRM solutions (Refer to Figure 7). This survey was conducted by members of the EE HPC WG (Energy Efficient HPC Working Group) in 2016-2017 and included responses from 11 global HPC centers. Detailed responses from the sites were also published as a poster and a white paper during the conference week.
The members of the working group solicited further feedback from the community in order to increase active participation in this field.

Figure 7: List of participating HPC sites investing in energy- and power-aware job scheduling and resource management (BOF Slides)
3.3. Experts’ input on role of the software stack

Figure 8: Subset of questions used to drive the Panel (Slides)
One of the notable events on the final day of the conference was the panel session titled, “Energy Efficiency Gains from Software: Retrospectives and Perspectives”. The moderator and the panelists (Dan Reed, Satoshi Matsuoka, Sadaf Alam, Bill Gropp, and John Shalf) underlined the importance of exploring the software stack to leverage power savings (Slides). More details about the session can be found here.

Figure 9: A slide depicting the impact on system efficiency due to optimizations within the software stack at the Swiss National Supercomputing Centre (Slides)
4. References
[1] European Exascale Projects, http://exascale-projects.eu/
[2] Exascale Computing Project (ECP), US Department of Energy’s Office of Science (DOE-SC) and National Nuclear Security Administration (NNSA), https://www.exascaleproject.org/
[3] Flagship Project 2020 http://www.aics.riken.jp/aicssite/wp-content/uploads/2017/05/Flagship_2020_Project_2015.pdf
[4] “China Plans 2019 Exascale Machine To Grow Sea Power”, HPCWire, August 23, 2017, https://www.hpcwire.com/2017/08/23/china-plans-2019-exascale-machine-grow-sea-power/
[5] “DOE’s Path Forward and getting to Exascale”, Jim Ang, Energy Efficiency HPC Workshop, Supercomputing Conference 2017, https://eehpcwg.llnl.gov/documents/conference/sc17/sc17_workshop_1530_exascale_ang.pdf
[6] “GEOPM at LRZ”, Jonathan Eastep, Energy Efficiency HPC Workshop, Supercomputing Conference 2017, https://eehpcwg.llnl.gov/documents/conference/sc17/sc17_workshop_1600_geopm_eastep.pdf
[7] “Power API Overview”, ‘Power API, GEOPM and Redfish: Open Interfaces for Power/Energy Measurement and Control’ Birds of a Feather Session, Supercomputing Conference 2017, https://eehpcwg.llnl.gov/documents/conference/sc17/sc17_bof_1715_2_powerapi_overview.pdf
[8] “GEOPM Overview”, ‘Power API, GEOPM and Redfish: Open Interfaces for Power/Energy Measurement and Control’ Birds of a Feather Session, Supercomputing Conference 2017, https://eehpcwg.llnl.gov/documents/conference/sc17/sc17_bof_1715_4_geopm_overview.pdf
[9] “Redfish Overview and Redfish Agent for Power API”, ‘Power API, GEOPM and Redfish: Open Interfaces for Power/Energy Measurement and Control’ Birds of a Feather Session, Supercomputing Conference 2017, https://eehpcwg.llnl.gov/documents/conference/sc17/sc17_bof_1715_3_redfish_overview.pdf
[10] Whitepaper, “Energy and Power Aware Job Scheduling and Power Management”, Energy Efficient HPC Working Group, Working draft, https://eehpcwg.llnl.gov/documents/conference/sc17/sc17_bof_epa_jsrm_whitepaper_110917_rev_1.pdf
[11] Slides, ‘State of the Practice: Energy and Power Aware Job Scheduling and Resource Management’, Birds of a Feather Session, Supercomputing Conference 2017, https://eehpcwg.llnl.gov/documents/conference/sc17/sc17_bof_epa_jsrm.pdf
[12] InsideHPC article, “SC17 Panel: Energy Efficiency Gains From Software”, Rich Brueckner, November 2017 https://insidehpc.com/2017/11/sc17-panel-energy-efficiency-gains-software/
[13] “Interfacing GEOPM with PowerStack”, ‘Power API, GEOPM and Redfish: Open Interfaces for Power/Energy Measurement and Control’ Birds of a Feather Session, Supercomputing Conference 2017, https://eehpcwg.llnl.gov/documents/conference/sc17/sc17_bof_1715_6_geopm_llnl_experience.pdf
[14] “Porting of GEOPM to IBM Power8 with NVLink microarchitecture”, ‘Power API, GEOPM and Redfish: Open Interfaces for Power/Energy Measurement and Control’ Birds of a Feather Session, Supercomputing Conference 2017, https://eehpcwg.llnl.gov/documents/conference/sc17/sc17_bof_1715_7_geopm_stfc_experience.pdf
[15] “A Journey to Exascale Computing”, W. Harrod, https://science.energy.gov/~/media/ascr/ascac/pdf/reports/2013/SC12_Harrod.pdf
[16] Raw data for the plot on Microprocessor Trends, (2011-2015) https://www.karlrupp.net/wp-content/uploads/2015/06/40-years-microprocessor-trend-data.zip
[17] Blog article, Karl Rupp, “40 Years of Microprocessor Trend Data”, https://www.karlrupp.net/2015/06/40-years-of-microprocessor-trend-data/
[18] List of Top 500 systems, https://www.top500.org/lists/
[19] Poster, “P90: Global Survey of Energy and Power-Aware Job Scheduling and Resource Management in Supercomputing Centers”, Siddhartha Jana, Gregory A. Koenig, Matthias Maiterth, Kevin T. Pedretti, Andrea Borghesi, Andrea Bartolini, Bilel Hadri, Natalie J. Bates, http://sc17.supercomputing.org/presentation/?id=post236&sess=sess293
[20] Poster, “P95: GEOPM: A Scalable Open Runtime Framework for Power Management”, Siddhartha Jana, Asma H. Al-rawi, Steve S. Sylvester, Christopher M. Cantalupo, Brad Geltz, Brandon Baker, Jonathan M. Eastep http://sc17.supercomputing.org/presentation/?id=post176&sess=sess293
[21] Energy Efficient HPC Working Group, https://eehpcwg.llnl.gov/
[22] Technical tracks organized by the EE HPC WG, https://eehpcwg.llnl.gov/pages/conf_sc17a.htm
Watch the video: SC17 Panel: Energy Efficiency Gains From Software