An Interview with Argonne’s Pete Beckman

Print Friendly, PDF & Email

pete-beckmanAs Director of the Exascale Technology and Computing Institute at Argonne National Laboratory, Pete Beckman has his thumb on the pulse of exascale development. In this feature interview, Beckman talks about the need for substantial investment in science and technology education in the U.S., and the direct link to exascale computing. Beckman, along with many other community leaders, shares a deep concern over the possibility of inadequate funding for exascale development.

Click Here for the Audio Interview with Pete Beckman

The Exascale Report: So what could possibly be anyone’s objection to providing adequate funding for exascale?

BECKMAN: Well, I don’t know if there’s ever any objection – I think that what happens in Washington is they have to make pretty difficult policy choices on where to do the investment. Do they invest in NIH research? Do they invest in a new NASA probe? There is a fixed number of dollars they have to invest in science R&D, and in some sense, the ones that will have the largest impact for the nation should be at the top of the list.

There is also science that gets funded that I would say is ‘curiosity driven’ – it’s not just about delivering results that will transform our understanding of something that matters to the country like energy or biology or medicine, but also just – we don’t know how something works out in space or down at the sub-atomic level and we just want to know.

TER: Some of the interesting discussion I’ve heard recently was around the notion that when we get the first one or two exascale systems, for a period of several years, they will be applied to a very limited number of applications. So the argument of “we need exascale because of all the possibilities of how it can change everything – from the way we live on this planet to healthcare and so forth”, is such a long term argument that people lose sight of it – because it’s just not something that’s going to give us a short-term benefit. Can you comment on this?

BECKMAN: I’m a very firm believer in that these investments in science and technology are long term wins for the nation and for our country. You’ve probably seen the numbers of high school and college students who are in science and technology and where we are in this country with declining leadership just in education in this area. To fix that, of course you can apply funds for education, but those students need to leave and graduate and then have promising , exciting research projects and new technology that they are diving in to. We have had tremendous leadership and growth in the Internet services, such as Google and Yahoo and Amazon, and cloud, and we’re in an amazingly dominant position there, but when it comes to the underlying technologies of chips and software and science and science-based new materials, new ways to get solar power, or new engines that are more efficient, that space can have a tremendous impact on our economy and on our life. If we could design through high performance computing – and improve the efficiency of an engine or a combustion engine or a jet engine, the amount of fuel that’s consumed, which we all pay for, can be dramatically lowered across the nation.

Pratt and Whitney ran code here at Argonne, and other folks have run codes at Oak Ridge and we’ve had GE running code here doing exploration, and having improved the efficiency of engines and the impact that we can have is tremendous, but it depends really on a full system – full court press. Everything from education up to the deployment of these large systems. And It’s not just a short-term win.

The grad students who start working on this now will turn into Ph. D. students and that will turn into post-docs, who then discover a new material. That’s a ten year win for us. That’s where we have to be sort of imagining our competitiveness in the world.

TER: I totally agree with you. A lot of folks get hung up on the term ‘exascale computer’ and picture this mammoth beast as opposed to thinking in terms of it being exascale levels of research – exascale levels of computation that we haven’t had before. Sure it requires that mammoth computer system to do this, but if we start to focus on picturing this large platform, that’s where the questioning starts to come in.

BECKMAN: Absolutely. That is how we look at it here. You know, the popular press likes to look at the Top 500, declare a winner, and then move to the next story. And you and I, in this community, know that is not how it works – that we have dozens of Ph. D. students and post docs, more than dozens, hundreds, across the nation working in a field in which we, as a nation, have to be number one. The design wars at the commercial level. The science based understanding of our climate. And what we need to do will affect policy as well as discovery of new drugs and new therapies and genetics, and the impact of our society with climate change. These are all things for which we want to be the authoritative source. We need to have the experts in this country and continue to attract them here, and for that we need a big exciting moon shot. It has to be something that people really get excited about. And they have been getting excited about exascale computing because there’s a lot of new technology – it’s not just turning the crank on old stuff. The power constraint is very challenging. The parallelism – managing a billion threads of control. These require new concepts and new ideas and these are fantastic challenges for the Ph.D. students working on this.

TER: As you recall, we featured you about 18 months ago in the very first issue of The Exascale Report, and at that time, you had described the journey toward exascale as being more “evolutionary” as compared to “revolutionary”. Do you still feel the same way?

BECKMAN: Yes – it depends on your perspective though. We will have to have revolutionary components because we haven’t seen some of these pieces before. But the overall architecture of how people compute – using parallel decomposition of problems – that stuff is not likely to change. So, imagine for a minute, as an example, if we look at the revolution that’s happening in the car industry. Right now we’re moving to electric vehicles. We have the Nissan Leaf – we have the Chevy Volt. These are revolutionary. But if you look at them, there are many parts that are evolutionary. They still have four wheels. They still require a power train. They still require anti-lock brakes. And all of these things need to be updated and redesigned in light of the new realities of the machine and the control systems. But there are new parts – the lithium-ion battery, the electric motors. So for us, it will be evolutionary. It will still be a massively parallel system, but the new parts will be directly managing power management, directly managing resilience, and modifying our programming model to adapt to these new challenges. But it will still, in large part, be the kind of thing scientists have invested 20 years in. Just like the 100 years of car driving people have been doing here in the states, that won’t go away as we move to electric vehicles.

TER: That’s a great example. So, overall, if you think back about 18 months, do the challenges seem different to you than they did 18 months ago?

BECKMAN: yeah – so we’ve had a set of meetings with the vendors and they’ve provided some responses and that has provided a very interesting level set with respect to what the technology companies believe are their big challenges. And if you look at what is happening in the software space, we also realize that currently there is not a global overarching plan in the United States. We had hoped that one would emerge, but it looks like it’s left on our ‘to do’ list – to build a software plan for how to get the software done for exascale. So it’s not just hardware. In fact, as you know, the hardware guys say that their job is the hardest and the software guys say that their job is the hardest – but what always ends up happening is the machine comes out and then several years later the software is done.

We might assume that the software is a very critical, if not the most critical part, so getting a jump on that now and getting some of these concepts of parallelism and power management resolved now would go a long way.
But things have changed. The face of exascale has changed. Of course internationally, other countries – based on our work with the IESP and other things have become very involved. But there’s also a realization globally that leadership in this space will demonstrate – or be necessary – to lead other spaces, such as design and engineering, and health and climate.

TER: So we’ve had a lot of attention, particularly through the public media, about exascale research labs being announced, mostly in Europe, along with Co-design centers and initiatives, and working groups like IESP. Do you think we really made that much progress in 2011 in moving forward toward exascale?

BECKMAN: Well, in the states, we’ve done a lot of the research and I would say a lot of the leg work to build the program. I think that while we started working on the exascale planning three years ago, that other countries have very quickly caught up. And as an example, if you look at Europe, the European Commission was presented with a first draft of a unified plan of what an exascale program might look like in October. The European Commission was very favorably disposed and very interested. The Europeans had commissioned an IDC report so they could understand their position in the marketplace. The Europeans had put together the economic impact, the science impact and what needed to be done. And they were asking the European commission for several billion Euros over the next ten years. And the next stage for them is they are putting together, internally, a more refined plan which would then be part of the European commission structure – their plan – and then they would of course issue calls.

In addition to that, the Europeans bootstrapped three small projects. They are relatively modest, but they show the direction and eagerness and plans for the Europeans. These are hardware and software projects to prototype possible exascale architectures. There are three of these and one of these is an all European design. It’s the ARM processor, put together by ST Micro and integrated by BULL. So ARM plus ST Micro plus BULL – if they can develop this would give the Europeans a home grown prototype for exploring exascale.

I think they are very serious about this and have fantastic technology in this area. The iPhone I have in my pocket uses an ARM processor. And the iPads that were under everyone’s Christmas trees this year use the ARM processor. The technology is at the cutting edge for low power, but the next step is to see if that can be adapted for HPC.
I think the Europeans have done a fantastic job of thinking through these issues and putting together a plan which happened over the last 18 months. Essentially, they started working on this plan 16-18 months ago and presented their first plan in October.

TER: So this would be a little bit different than the way we think of those European countries being a bit more independent. For example, people might think of an exascale system coming out of Germany before any other countries in Europe, but you’re talking more about a pan- or cross-European collaboration.

BECKMAN: Yeah – I think of this very much like an Airbus strategy. And in fact, the IDC report, which is publicly available, points out that Europe as a whole, has a tremendous amount of fantastic technologies. It’s just that it is scattered in Spain and France, the UK, Germany, the Netherlands, and other places. And if they can weave that technology together, if they can share a common target, then in fact they can be quite powerful. Airbus is an example of taking the technology in Germany and France and the UK and other places and putting it together to make a single airframe and plane.

So I think the Europeans are quite serious about pursuing high performance computing and putting their technology and their lab efforts together –as well as solving the science problems. We’ve mostly been talking about the tech development but the science is also something they have tremendous strength in. Everything from engineering and weather prediction and climate to materials and genomics and everything else.
TER: So Pete, in addition to all this activity of which you’ve had first hand knowledge, you’ve seen all the responses to the recent round of RFPs. I know you can’t talk about a lot of that in detail, but let me ask you, have you seen a promising roadmap that will get us there by 2018?

BECKMAN: There are a lot of folks and responders who believe there is a roadmap and a plan that could get them there in the 2018- 2020 timeframe. Now the reason that I’m a little squishy on that (2018-2020) is really two parts. One is that no one really knows what’s going to happen with the Congress and funding, and without understanding what kind of investment there is, it’s hard to know whether or not we’ll make those targets, and the second thing is, we really are dealing with new, exciting technologies and when you are looking out that far ahead – maybe 8-10 years ahead, it’s pretty to understand whether or not a particular technology will pop in the way you hope it does and provide you with a solution.

But companies were fairly positive that, with the right investment, that we could lead and achieve those goals. Power would still be a problem though. All the companies pointed out that trying to hit a 20MW envelope would be very, very challenging – which we like. We absolutely want to challenge the companies to produce something extraordinarily power efficient. Supercomputing has always been a time machine – it gives you a snapshot of what will be available in the commodity market five years later.

You’ve seen what happens on the Top 500. The machine gets to the top, and then every year starts dwindling down until it drops off, and it’s no longer one of the Top 500 machines. But that process is actually a good process. It means that the technology eventually gets embedded in the rest of the company and the rest of the community and is used – and we want the absolutely lowest power, most efficient machines that we can find and technology they can develop, because our nation’s appetite for computing and cloud computing continues to grow rapidly – unbridled. More and more things that we expect are run in the cloud and that’s taking a tremendous amount of power. There recently was an article about Google’s power budget and it’s quite large. And as we all watch more Netflix and run more Internet services, and push things out into essentially the cloud, we want the technology we’re developing now for exascale for power efficiency to find its way into the rest of the server world.

TER: So are there some points about exascale that you would like to get across to the community that you don’t think are being treated properly or communicated clearly enough?

BECKMAN: Well, I think that one of the things that we touched on was that it’s not about a number one machine. I know that’s easy for people to get their heads around. It’s like the mission to Mars with NASA. Unless you can imagine, ok they’re going to send this to Mars – that’s why they are going, they want to land this probe on Mars. But there’s so much more that is part of that mission – and part of that science – to understanding everything about it – from our planet to other planets in the universe. And the same is true for computing. It’s not just about that number one or number three or top few machines. It’s about all of the science, the education, the new technology that will make it into servers and the design-based planet we will live in where engineering and simulation help us make smart choices. That’s why we invest in this space.

TER: So, let me ask you a tricky question. Regarding the recent disruption with Blue Waters, what kind of impact do you think that’s had on the HPC community?

BECKMAN: That’s actually a pretty good question. It’s very fair. It was a tremendous disappointment in the community. There had never before, in the community, been a time where a very large company essentially chose to change course. Of course our field is replete with many a small company going out of business. The companies that were there 15 years ago, with the exception of just a handful, are not anywhere to be found. Companies like BBN and Thinking Machines and all of those. So it wasn’t uncommon in our space because it is such a high tech driven industry, for things to be in flux. But for a large company to make that choice – that this didn’t end up where we thought it would end up – that was a surprise for us in the community.

Now the good news is that I think it points a lot to the model that works well in these design / build systems. So, the Blue Gene system was designed and built under the supervision and funding of the DOE through Argonne and Livermore Labs. Argonne and Livermore had a contract with IBM to design the Blue Gene “P” and “Q” prototypes which then would get productized later – and delivered to us and we would buy them. That model has been very effective, which is – you work with a company – you’re funding them – but you’re also funding your own people to help with the design because you’re in it all along the way, and can make those tradeoffs, such as, “Look this technology might not be cost effective. We’re going to have to change something.”

And you can make those tradeoffs all through the process, so you end up at the end point with something that everyone can nod their head at and say, “Yes – this is where we wanted to be.”

The model that does not work so well is – I’m going to plop down a contract for something five years in the future and all the technology has to work at the right levels and at the right price and the right way in order for that thing five years from now to be a viable, cost effective product. In our world, that’s a stretch. No one would do that. And I think that the Blue Waters problem demonstrated that the model that works best is a partnership on the architecture from the beginning – with applications, with the DOE labs, and with the vendors so that those tradeoffs can be made all along the way. And if progress isn’t happening, then you have “go/ no go” decisions all along the way during the R&D phase, so that you never get to the end point and have a “no go” at the very, very end – you’ve been watching the whole process.

TER: So a much tighter hold on the reins all the way along.

BECKMAN: Yeah, it’s a co-development strategy with the company. Cray has done the same thing. The original Red Storm machine was a partnership between Sandia and Cray. And those sort of very close partnerships on building very complex, breakthrough, disruptive components along the way – they really need to be done in a collaborative environment where the tradeoffs can be made – so it’s a balanced risk.
So let’s look at the future for example. We know that in the memory space, there are two new technologies that folks are interested in – in the solid state memory, non-volatile memory area. One is memristor and the other is phase-change memory. And these are both candidates for exascale computing. The question is for us, as we look to how do we interface with those industries that design and build and create the software to use those things, we need very close partnerships, because right now, those technologies are still under development and we need to be able to change paths or change approach rapidly.

TER: The reality is it’s all going to come down to money. What’s at stake here in February? What level of funding are we looking for and what happens if we don’t get it?

BECKMAN: The funding we’re looking for, when I look at it, it’s quite modest. The investment of several billion dollars over ten years seems pretty small to me given the impact in technology and science and education – the excitement you can get behind a project like this – all the way from the high schools all the way up to colleges and universities. To me, it seems like a pretty good deal – to invest handfuls of billions of dollars over the next ten years to maintain and exceed our current position in technology and science and the computation and simulation space.

What’s at risk is, if we fail to act, then the leadership position that we enjoy right now, because of previous smart people investing in science and technology, will be in jeopardy. It has a trickle down effect when our technology leadership and science leadership is weakened. And it goes all the way back down to the school systems with an impact on attracting kids to get a degree in electrical engineering or computer science or physics or applied math.

TER: What’s your Christmas wish for exascale?

BECKMAN: Well the letter to Obama was fantastic and if there were more people jumping on the bandwagon in Washington, senators and congressmen and representatives, saying, “Look – we need to fix our budget. Absolutely. But we need to maintain our science and technology position, and in fact we need to improve it and compete by designing these new systems.” That would be fantastic. I would have a great Christmas if people could get excited about science and technology leadership for the nation.

For related stories, visit The Exascale Report Archives.

Comments

  1. exa-preview-1 says