This interview appears courtesy of The Exascale Report.
Peg Williams was recently named to the position of Senior Vice President, HPC Systems at Cray. In her new role, Williams is responsible for the company’s R&D efforts along with the product and business line management. With more than 20 years experience in HPC, she has a keen understanding of the increasing complexity of high-end HPC systems, and she evangelizes the critical role of software for this next generation of systems.
The Exascale Report: With the recent announcement of Titan, we can expect to see more discussion around accelerators, so let’s talk about the software component of exascale systems. Why do you feel software has become a critical discussion for all parties when talking about exascale?
WILLIAMS: I think there are several key roles that software plays in the discussion of exascale. One is in the area of reliability – what can you do on the software side to impact reliability, isolating the user from machine failures and allowing the system to continue to run.
There’s also work that needs to be done on the software side relative to programming models. One of the big issues application developers have going forward is dealing with the complexity of these very large systems. For example, at the application level, it’s not only how do we find the parallelism, but once we do, how do we explain where it is? How can they express all of that parallelism in a way that a compiler or certain tools can actually map all of that to the machine?
These are some of the big challenges on the software side. I think software has to help in the reliability piece of this and it certainly has to help in terms of making sure that there is a way to express the parallelism, and then once it’s expressed to take it and do something with it in mapping it to the architectures.
TER: And you’re really talking about the programming models at this point?
WILLIAMS: Mostly the programming environments that the vendors create around the community-adopted programming models. In our programming environment we have compilers and tools and libraries and all kinds of things that work together, but yes, I’m mostly talking about that marriage between what the application developer needs to do and how they need to hand it over to someone who might be building a compiler or tool set or library set to get the job done
TER: And this is only complicated I’m guessing when we add accelerators into the equation?
WILLIAMS: Yes – the fact that accelerators are there forces you to have to find the parallelism to use them. If you have an accelerator but there’s not enough parallelism, or you can’t tell me about the parallelism in your application to allow me to map it to the accelerator – it won’t do you any good.
This is an area actually where I think that Cray has an advantage, especially if you look at a GPU-like architecture, and I actually think the Many-core space is like this too, but if you think of a GPU, at its highest level, a GPU is like a vector. It’s going to compute like a vector computes. There’s no company that understands vector processing better than Cray.
So we’ve got a history of being able to find parallelism in code and map that parallelism on to vector-like architectures. There’s a rich history of skills here that I believe we can bring to bear in this accelerator space. It’s the human capital that we have that understands this problem incredibly well because we’ve been dealing with it for so long.
TER: To what extent do you think this can be automated with the use of compilers and tools?
WILLIAMS: So I actually think we’re going to see more of this–and we’re building a programming environment that does this.
You’re going to have to have a more integrated look between the application developer and the tool set going back and forth.
First of all, you have whatever language you ended up with or whatever way you use for the application to express its parallelism. Right now we are working with the model of extensions to Open MP directives and MPI. But there has to be a loop between the application guy and the tool set that says, well, he’s put in his directives into Open MP, and the compiler runs it and the compiler looks at it and there’s a set of decisions it can make and a set of decisions it can’t make because it doesn’t have enough information. There has to be the feedback loop back to the code developer that says if you could tell me these three things I could do a better job. And then let the application developer look at that and see if he can provide more hints and tips into the compiler to do some of that. Then you go through the performance analysis loop, you grab performance data and then there’s another set of things that the compilers and the tools can tell you about your execution and feed you back data. And at that point you can come back in and optimize. So I actually think that there has got to be a tight relationship – there’s got to be a way to do a tight marriage between the application developer and the toolset that you build so there’s a communication that can go on. I do think a lot of that can be automated but there has to be a lot of interaction initially to do that. I do think you could take the burden down. A guideline I’ve been giving my programming environment team is this: If through the automated systems they can build, if they could get 80% of the performance on 20% of the effort on the application developer side, we’re good. That’s the 80/20 I want to look at. And we think we can get there. But indeed there is a learning curve for all of us to go through as we start to work with these accelerators.
TER: So what words of advice do you have for the new guys coming in to HPC?
WILLIAMS: You know, I like to think of it more in line with what advice do they have for me? The folks coming out of school today are quite different than the traditional users of supercomputers. I’ll give you a side story here. We had some folks come in this summer to help us do an evaluation of the productivity of some of our tools. We wanted to do some experiments and we were describing the environment that we wanted them to run in, and when we suggested they run Fortran codes, they looked at us with total and complete blank stares. So, we changed it and let them work in C++ and they were fine. The times are certainly changing.
So what advice do I have for them? I think they have to come in with complete openness and flexibility and not get locked into the models that we’ve had in the past. I’m a realist in the sense that a lot of the codes we have today have been running for years and people tweak them and they change them and they adapt them for the current architecture, but for those who are developing applications from scratch, my advice is be flexible, be open, and be creative, because with the kinds of architectures we have today, creativity can be rewarded. Think outside the box.