In this special guest feature, Tom Wilkie from Scientific Computing World writes that software approaches to energy efficiency in HPC may yield unexpected improvements in the hardware of next-generation mobile phone networks.
International efforts to improve energy efficiency in high-performance computing may, somewhat surprisingly, make mobile phone communication cheaper and faster.
At the ISC 2015 in Frankfurt in July, Allinea, the software development tools company, is expected to unveil a new addition to its range: an energy profiling tool. The following month, Adept, a European research project addressing the energy-efficient use of parallel technologies, is expected to release a set of benchmarks that it has developed to characterize the energy consumption of programming models on different architectures. Meanwhile, a different EU-funded project, Deep, is taking a holistic approach to energy efficiency, using software in a different way to optimize not just the computer’s but the whole building’s energy consumption.
These developments underscore not only the rising importance of energy efficiency in the future of high-performance computing, but also the growing interest in exploring ways of reducing power consumption in addition to the frontal approach of developing more efficient cooling systems.
Estimating energy in advance
One of the goals of the Adept project is to address the problem of how a software developer can know how much more (or less) energy their code will consume if it is run on a different architecture – moving from OpenMP to Cuda for example. The point is to find a way of predicting the energy consumption without having to go through the labour of actually rewriting and porting the code, running it on the alternative architecture, measuring the results, and only then finding out if the whole time-consuming exercise has been worth it.
The Adept approach has been to create a tool that builds a ‘model’ of computer code to estimate the effect of running elements of the code on GPUs, perhaps, rather than CPUs. For a piece of application software, the tool creates “a fingerprint of instructions and a model that’s independent of hardware, from which we can predict power usage,” according to Nick Johnson, Applications Consultant at the Edinburgh Parallel Computing Centre (EPCC) in Scotland, which is coordinating the project. “Once you have a good model, you can plug in parameters for the target architecture that you’re thinking of running it on.”
He stressed that there is a trade-off between performance (in terms of run-time for an application) and energy efficiency. “For some fingerprints, you might find that performance is worse, but that may be acceptable if there’s a saving in power. You might be willing to have code take 50 per cent longer to run, if there is a 50 per cent saving in power.”
However, David Lecomber, CEO of Allinea, differed somewhat in his view: “You would not want to optimize for energy, if it will cost you time as well.” If you slow the run-time, then users will complain, he noted drily. However, there are many applications for which you can slow down and save power without lengthening the run time, he continued. For example, most processors spend a lot their time idle, waiting for data to arrive from memory: they do not need to be working at full tilt and so they can work more slowly, decreasing energy consumption, as they match the speed of the data. Storage is an even more extreme example, because it is slower still in terms of data transfer.
In this way, simply tuning an application program for better performance can also lead to lower energy costs. If the job takes less time to complete, then the energy consumption overall will be lower. After all, he remarked, the end-goal of all this is “science/Watt” – i.e. the amount of useful science (or engineering) that can be got out of a supercomputer, rather than the number of computations. Supercomputing sites ‘need to look at how their applications are running on their system’, he said.
Energy-efficient HPC and mobile phones
Over many years in HPC, parallelization and concurrent computation have been used to decrease the overall time that an application requires to run to completion. It’s widely accepted that recent advances in low-power multi-core processors from the mobile phone and embedded markets have widened the choice of hardware architectures available to HPC. What is perhaps less appreciated is that advances in such processors are also forcing programmers of embedded systems to use the sorts of parallel computing techniques that are familiar to HPC programmers.
Thus, while Adept, a three-year project which started in September 2013, has EPCC, Uppsala University in Sweden, and Ghent University in Belgium as participants, it also involves the telecoms company Ericsson AB from Sweden and a small Edinburgh-based company, Alpha Data, which specialises in FPGAs and the like for digital signal processing, imaging systems, communications, military and aerospace, as well as high-performance computing.
Ericsson’s interest in Adept stems from LTE – the Long Term Evolution of wireless data communications technology intended to increase the capacity and speed of wireless data networks using digital signal processing techniques. LTE also involves redesigning and simplifying the architecture of wireless communications networks, moving them to an IP-based system so as to significantly reduce transfer latency.
Mobile phone base stations have to cope with huge quantities of data, voice, video and they have to do so with minimal energy consumption. Like all telecoms companies therefore, Ericsson faces choices about which hardware to invest in, and needs tools to allow it to decide what to buy next.
The Adept project is providing a way to explore the design space for software developers, but it also provides the reverse function: if you know your code, it can give you an idea of what hardware to buy to run it efficiently – something invaluable to LTE technology providers.
Adept started in 2013 and, according to Lecomber, Allinea too has been working to find ways of developing energy profiling as part of performance tools for the past couple of years, in partnership with the University of Warwick in the UK.
Allinea specializes in providing high-level benchmarking tools that allow people to understand the performance of their application without needing to see the source code, and software development tools. Now the company is adding an energy-efficiency component to its toolkits. Lecomber pointed out that Allinea Performance Reports can be run through “a real code on a real workload, and you can see at a glance how much time you are spending in I/O; how much time you’re spending in MPI; thread synchronization down to the processor level; you can see how much time you spend getting stuff from main memory; and how much time doing floating point vector operations.” Adding the energy profiler “Will guide you on how you can run the application to use less energy,” he said.
As an application runs, Allinea’s software will be taking energy measurements “so you can see the spikes in the energy over the execution of an application, and tie those spikes to areas of your code and focus on those. The tool will allow people to go almost down to the line level in looking at energy usage of their applications.”
Measuring power consumption
But in order to reduce power consumption, it must first be measured. In the Adept project, energy measurement is EPCC’s role –developing the benchmarks — while Alpha Data is providing the power measurement board that filters data so as to get the highest resolution power reading. According to Johnson, EPCC has a set of different types of hardware in the laboratory so that it can test performance on different architectures and then provide feedback on the accuracy of the model’s predictions as compared to the actual energy consumption.
For Allinea’s David Lecomber, ‘There is no perfect measuring system. There is always a slight delay, capacitance effects; even the granularity of the frequency of the sampling.” However, he pointed out that Intel has the Rapl metric, which gives a lot of processor-level data on energy. Vendors such as Cray have built in decent measuring systems that bring in server-level energy information. “All the vendors are keen to enable to make that information available, so you can see it in the operating system and understand the energy performance,” he said.
Allinea’s focus has been at the Rapl level and also on the energy consumption of accelerators such as GPUs. He pointed out: “You can see the spike in energy over the entire application every time you go out to a GPU. You are using more energy, but it is doing the work and the trade-off is clear. You are using more energy, but you are going to finish the computation an awful lot quicker.”
However, “if you are spinning up a GPU, you still have the CPUs running’” so Lecomber stressed the need for understanding the energy consumption at the level of the whole system not just Rapl and accelerator – “that’s what we are bringing in.”
Axel Auweter, team leader for energy efficiency in the Deep project, has an even wider concept of the whole system. The electricity consumed by a supercomputer is rejected to the environment as heat and, in many countries, it can be used for heating offices and spaces in the rest of the building. But this requires water at a particular temperature, dependent on the outside environment as well as what is going on inside the machine itself.
Deep is investigating hardware prototypes, not just software, and the machine is lavishly equipped with energy and heat sensors to monitor its operations. But the wider environment (in principle the building itself) and its demand for energy are also being taken into account and the system software has been designed to optimise energy consumption as a whole, he said. This could even mean that the operating system runs the computer less efficiently, in terms of energy consumption, in order to ensure that the whole system decreases its demand for energy.
Optimize applications, not systems
Allinea has been testing its energy-efficiency software for some months, ahead of the formal launch. Lecomber said: ‘On some codes, you could slow the processor down by 10 to 15 per cent – and reduce the energy by similar amounts – yet the computation finished in exactly the same time.’ The outcome underscores his theme that the best route to energy efficiency is to optimise for performance.
He stressed that the issue is not the performance of the site’s HPC system, as measured by the canonical Linpack benchmark used to draw up the Top500 list: “Linpack is irrelevant if your HPC system is doing say, OpenFoam.” Rather the issue was the performance of each individual application on that system: “Sites really need to get serious on the benchmarking of applications for performance.”
Lecomber warned: “It is very easy to use an HPC system inefficiently. If your users are eating hours of cluster time, but their application is poorly configured – because it is using too many MPI processors and has lost efficiency by doing that, or maybe the thread/processors balance is bad and you’re losing half the time to synchronisation – that’s energy and time you’re losing. It’s too easy to assume that codes are well optimized for the systems they are running on. Benchmarking of applications will enable HPC centres to understand more about what’s going on in their system, and where their energy is actually going. That will lead to improvements in their throughput – the actual amount of science per Watt.”