Transcript of Green HPC Episode 3: What do I get out of going green?
This is the transcript for episode three of the Green HPC podcast series, What do I get out of going green?. You can find out more about the series at the Green HPC podcast series home page, and you can listen to the audio of this episode, find out more about the speakers, and get access to links and presentations that they’ve suggested at the episode 3 homepage.
This episode sponsored by Cray Inc.
[0:00] Going green does not mean you have to sacrifice performance. Proven at the petascale level, Cray’s innovative Ecoflex cooling technology gives scientists both supercomputing power and energy efficiency. Visit cray.com to learn more.
Christian Belady:
[0:16] I can change my form factor and make it a 2U server for that same single board. I could have bigger heat sinks, I could have a bigger cooling solution. I could now still maintain the same processor temperatures because now I have more room.
[0:31] Now potentially I could run in a 50 degree C environment. If I had a server that operated in 50 degree C, I could essentially put that server anywhere on the globe in the outside and not need conditioning.
[0:43] [music]
John West:
[0:54] That was Microsoft’s Christian Belady, who we heard quite a bit from in the last episode. This is Inside HPCs podcast series on green HPC. I’m your host, John West, and the editor of insidehpc.com. Christian was talking about change and the forces that drive change.
[1:11] We’ve spent many years racing to denser and denser solutions. Essentially maximizing the utility of machine room space in the absence of other first order concerns. But the changes that the industry is seeing in terms of factors that are shaping buying and system management decisions are creating a whole new set of parameters that will inform and influence design. 50 degrees C is about 120 degrees Fahrenheit, which is remarkably hotter than my 65 degree machine room today. To think about that change is really remarkable.
[1:44] Today on the green HPC podcast, we continue to ask some important questions and offer some strong and well informed opinions about where we’re headed and how green is going to change the way you run your supercomputing data center tomorrow. We’re going to look at what are the reasons that people have for caring about green in HPC, and in particular what do large data centers get out of going green? Change is hard, right? So why are data center owners and managers making change?
[2:15] When you first turn into this green supercomputing conversation, you’ll find a lot of different people talking about really three different ways of selling the idea. People talk about the very pragmatic approach, where you look at a fixed pot of money available to run a machine over its lifetime. The less money you have to put into paying for things like power and cooling, the more you can invest in the computer itself on the front end. These people want to minimize the energy that they use as a way of maximizing the computing that they can buy.
Pete Beckman:
[2:47] So if we just plot the cost, it’s an increasing cost in our data center and so we have to contain it. But the second is also a responsibility, is that when I can reduce infrastructure demands on the state of Illinois even if it were free, even if I didn’t pay a differentiated cost, doing that is to the good of everyone. And so we should try.
John:
[3:12] That was Pete Beckman at Argonne’s Leadership Computing Facility in Chicago talking about the way that he looks at these issues. Addressing the financial stewardship concerns, which are real, because a lot of supercomputing around the world is done with the public’s money. And also bringing in the environmental angle.
[3:28] As we’ve heard already in this series, the most obvious take on green computing, doing it to save the environment, is in some ways the least compelling of the arguments. It’s a concern, but it doesn’t seem to be the one that gets people out of bed in the morning. That’s what we want to talk about on today’s show. People don’t generally make a change without incentive, and we wanted to understand what the incentives in favor of taking green measures in HPC are and which ones are shaping customer behavior.
[3:58] In this episode we’re going to talk to a few of the HPC companies about what their customers are telling them. We’ll hear from Steve Scott at Cray and also Sumit Gupta at Nvidia about what their customers are looking to get out of taking green steps in HPC.
Steve Scott:
[4:12] Some people are facility constrained and they can only cool so much, so they have a limit of so many kilowatts that they can supply or cool in their facility. But everybody has to pay the power bills.
John:
[4:25] One of the stories that I think really encapsulates many of the forces driving us toward green computing today comes from IBM. We talked to IBM’s Dave Turek about what the company was thinking 10 years ago, when it started thinking about Blue Gene and what lessons the success of that machine in doing lots of computations in an energy efficient way have for us today.
Dave Turek:
[4:47] IBM’s perspective on green supercomputing, if that’s an appropriate phrase, really commenced in earnest in the late 1990s. We were doing two things simultaneously.
[4:58] One, we made a judgment that the application domain for supercomputing was going to expand. We were on the verge of exiting the decade or maybe even the century of physics and we’re about to enter the area driven by life sciences. At the same time, as part of an annual strategic outlook that we execute through our research division, we saw what the implications were at the time. Current microprocessor design in terms of the way systems were being built for supercomputing applications. Those two thoughts came together and gave rise in December of 1999 the birth of the thought that lead to Blue Gene. That design, which of course is still with us today and increasing in popularity, is really predicated on the notion that the future will really be a future dictated by energy constraints, space constraints, and an embrace as a result, of massive scalability based on lower power cores but many of them.
John:
[6:01] So did it just happen that the life sciences thing was there as well? Or is there something about life sciences that pointed to low power as well?
Dave:
[6:11] No, life sciences per se didn’t point to low power, but if you recall back in that era, it was really when the notion of petascale computing was beginning to be talked about. In fact, the sort of classical application people were pointing to was protein folding. Because it was calculated that it would require a petaflop computer to work for a year to really model the folding of a single, fairly simple protein. So that was the stalking horse for the whole discussion around how we would get to petascale computing, the architectures required, etc. It could just as well have been an application of some other domain. But in that time frame, the market phenomenon was that there was this great rush of money and intellect and energy into the life sciences space. So those things came together and proved to be the catalyst for how we began to think about this.
John:
[7:08] Did you set out from the point of view of trying to minimize energy? Or did you set out from the point of view of trying to maximize the compute that you can provide within say a fixed energy budget?
Dave:
[7:22] We didn’t see those two issues delinked at all. We saw them as intimately connected and we see them to be connected that way in the future completely. So as you may recall, at the time we were on the verge of seeing the birth of the Earth Simulator and so on, and that was a heavily energy consumptive system. It was the last of the dinosaurs in terms of how people were designing high end supercomputers at the time. Which put a premium on a fast microprocessor and vectors and all these other kinds of things.
[7:58] The consequence was what it was. Two football fields of space, a huge amount of energy consumed and a fair amount of compute power relative to what was standard for the day. But we didn’t see that as having any future. We saw that as being fundamentally a dead end and we needed to move beyond that. So in other words, what we didn’t want to do was simply build a faster computer that required three football fields and required half of a nuclear power plant to power it. That would have been an embrace of ideas that we thought at that time were already dead.
John:
[8:34] Because there aren’t a lot of customers that can get a whole power plant just to run their computer? Or because it’s not a responsible way to behave in the sense of carbon footprint and managing the environmental change?
Dave:
[8:47] Well, I would say in 1999, the language of the environmental movement today with respect to carbon footprint and green et cetera, was not as prominent as it is today. I think back then it was really the way the market was working that essentially said that energy is equivalent to dollars. The more energy you consume the more dollars you’re spending, and as a consequence you need to back away from the abyss and move in architectural directions that are actually reasonable and affordable to people trying to solve very, very important kinds of problems.
John:
[9:22] Do you think your customers today have really changed? I mean I know the language has changed? But do think they are still concerned about energy and operating costs so they can have more compute? Or do you think they are now concern really fundamentally with carbon footprints and impact on the environment?
Dave:
[9:39] I really think when we talk carbon footprint and impact on the environment, it moves into more of a qualitative view of what’s going on. And by and large when people are working in this area they like to get things as precise and as quantitative as possible. So all issues, whether were to deem them, oh I don’t know subjective, qualitative, political, whatever term you like to use. It has to be monetized for to make sense to a businessman.
[10:10] So in that sense, what we have seen now for at least five years and maybe longer, has been the progressive embrace of the community, across the board-the U.S and non-U.S.-of really focusing on key cost elements in terms of how it drives the use of IT and computing in general.
Steve:
[10:32] Basically better power efficiency turns into lower cost of ownership which is good for Cray, good for our customers, completely independent of our environmental benefit.
John:
[10:41] That’s Steve Scott at Cray. He started out from sort of the same place with the idea that the costs in the system, both upfront and ongoing costs matter. But then he extended that idea to recognize the fact that the raw numbers have to be put in perspective of the work that gets done with the system. And intrinsically, at least, the value of that works to whoever is paying for it.
Steve:
[11:04] So green computing simply means lower energy costs, lower energy costs and that saves us money and makes us more competitive. So, it’s simply about trying to get more work done for the same amount of energy.
[11:17] One thing that I would like to point out, when I talk to people about this is that what really matters is sustained performance per watt. Just like sustained performance per dollar, not peak performance per watt. When we think of green computing, people typically talk about their peak performance per watt, or their limpac performance per watt as is used for the green 500. But sustained performance is what matters and there is a lot more upside on sustained performance per watt than there is for peak performance per watt. People talk about power usage effectiveness and trying to get the PUE from 1.5 to 1.2, as close to one as possible. But a computer with a PUE of 1.2 has an upside of about 20 percent that they could gain in overall system efficiency, power system efficiency. But a computer that is sustaining only five percent of peak has an upside of 20 times. So what fraction of peak you sustain is actually much more important than just the PUE of the machine. And that’s what it’s really all about.
John:
[12:25] So that then goes to that’s very workload dependent. So to the extent that that is possible, that goes to buying a machine that is more suitable to your workload.
Steve:
[12:34] Absolutely. So in a sense the buying criteria isn’t all that much different from when you are looking at performance per dollar. What you really want to do is understand your workload and have benchmarks representative of the workload that you are doing. And then measure a perspective machine on those benchmarks to see what kind of sustained performance you get. The only difference is that performance per watt is a criteria as well as performance per dollar.
John:
[13:01] And of course diverse workloads like those that you find in the DOD or the Terra Grid or other large HPC programs, make this process hard, but as Steve says, buying computers is complicated.
[13:14] But what about those buyers? When they come in, are they coming in with an environmental mandate?
Steve:
[13:20] It’s not so much an environmental mandate, although it’s nice that the environmental and the business motivations in this case are very much aligned. It really has to do with two things. From a system perspective, it’s cost of ownership. Some people are facility constrained, in that they can only cool so much, so they have a limit as to how many kilowatts they can choose, that they can supply or cool in their facility. But everybody has to pay the power bills.
John:
[13:47] One rule of thumb that Steve gave me is that a computer that draws a megawatt of power takes about a megadollar a year to run. In megawatt sized computer itself isn’t that unusual in the context of the top 500. You will find megawatt sized computers down the list as far down as 100 or even further. But aside from the prudent management of financial resources angle, the energy required to run large installations of today’s processors is actually limiting the work, the operations, and the computing we can do because of the ways in which that impact chip and systems designs.
Steve:
[14:24] But if we look at the underlying technology at the node, power is also starting to become a design constraint. That is, we have to limit what we can put on a piece of silicon because of the heat, not because of the transistor space. That is, we cannot afford to fill up our chips with transistors and run them at full speed because the chips will melt.
John:
[14:48] Of course, Steve there is talking there primarily about the kinds of chips most of the computer manufacturers today build supercomputers out of. And with a few exceptions those chips are the S86 architecture from Intel and AMD. But there are other people working on other approaches to computing.
[15:04] And an old idea that has become new again in an HPC is acceleration. In particular acceleration with graphics processing units, or GPUs. There are definitely challenges with using GPUs with traditional HPC applications and a lot of those challenges arise from the programming side, where you have to map one kind of computation under an architecture that was fundamentally built to do something else. This complicates things. But when there is a good match between the application and the GPU, the speeds up can be significant. Which goes back to Steve’s point of getting the most out of the flops you were buying being a very effective green computing strategy.
[15:42] So it talked to Sumit Gupta of Nvidia, probably the dominant provider of GPUs in computer today about whether customers come to them in order to minimize acquisition costs, to get performance, or specifically to minimize energy.
Sumit Gupta:
[15:56] So I think what we find is power consumption and the need for more energy efficient computing systems is at the top of the mind of most of our customers.
[16:08] So they come to us typically because they want higher performance, because they are not able to either build a GPU plus big enough to meet their requirements or just not able to get the response time that they need. I think it’s a welcome surprise to them that they can actually meet their performance objectives and do it at a smaller power requirement. More often than not, and the classical example that we have is from BNP Paribas, which is a bank in France. They engaged with us initially needed higher performance on their pricing algorithm. But by the end of it, what they found was the energy saving that they were getting by using GPUs, at the same time meeting their performance requirement was a huge win. In fact, that became a central part of their message to the community. Their press release talked more about the green computing aspects of GPU, than about the performance benefits that they got.
John:
[17:14] That’s it for this episode form the green HPC podcast series from insidehpc.com. You can find out more about the topics and the people in this episode by going to insideHPC and clicking on the link for the green HPC podcast series. Until the next episode, I’m John West, from all of us here on insideHPC.com, thanks for listening.
This episode sponsored by Cray Inc.
[17:37] Cray’s the leader in green supercomputing solutions. Proven at the petascale, our Ecoflex technology allows computers to operate at unprecedented speeds, while allowing for significant energy savings in data center flexibility.
[17:51] Ten times more efficient at removing heat than water, Cray’s Ecoflex cooling does not compromise performance and scalability. Uncompromising performance. Unparalleled design. Cray is the supercomputer company. Visit Cray.com to learn more.
[closing music]