Successful Deployment at Extreme Scale: More than Just the Iron

I have deployed many systems that, in their day, were large scale: several top 30 and a couple top 20 systems. They were deployed in stable programs with established user communities and did not, for the most part, represent a radical departure from what had been done in the past. We worried about clearing floor tiles and getting the power and cooling in the right places. After that, it was all about installing, configuring, and wringing the bugs out of a machine that was built principally from proven parts for people with existing codes that they anticipated would run as expected. Which they usually did.

Other than a new user manual, an updated FAQ, and perhaps a training course the vendor threw in to sweeten the deal, there wasn’t much thought about anything else. But as we cross through the trans-petaFLOPS regime in exascale computing a new pattern is emerging for high-end deployments.

Machines at 10, 20, and 50 petaFLOPS are large enough to break just about everything in the computing ecosystem. Hardware aggregated at this scale starts to show increased failure rates that we have only worried about theoretically in the past. Interconnects and system software have never been tested at this scale, and both hard and soft errors take flight from previously uninteresting corners of the system. And users begin to dig deeply into both their applications and the science supporting those applications as they attempt to break down the next important barriers in science. All of these stressors become more pronounced the larger the machine, and most people anticipate that the transition into exascale will be disruptive to both users and to the centers fielding them.

Successful deployments need to manage these stressors for both their users and their programs, and the job is not easy.

Users first

As Richard Hamming so famously said, “The purpose of computing is insight, not numbers.” Analogously, the purpose of fielding extreme scale systems is discovery, not computing. Systems are fielded to enable breakthroughs in science, engineering, medicine, and the humanities that have the potential to dramatically, and occasionally radically, improve our quality of life. This means that job one is making sure that the users and their applications can make meaningful use of the systems being fielded.

Although it is certainly not the only example of effective community building around a deployment, the Blue Waters deployment in Illinois is arguably one of the most comprehensive efforts to date. Bill Kramer is the deputy project director and co-principal investigator for the Blue Waters project at the National Center for Supercomputing Applications (NCSA), at the University of Illinois in Urbana-Champaign. Over the past several years Bill and his team have been focused on building the facility and designing a system that, when finally turned on next year, will provide 10+ petaFLOPS for open science. But if you’ve been following what the Blue Waters team has been doing you know that they have taken a radically different approach to the launch of this capability into the community.

Getting the system fielded is only the beginning of their efforts, not the end. The really innovative things that the Blue Waters team are doing can be seen in their focus on training potential users, evangelizing the machine and its capabilities, and reaching out to new disciplines that should be able to benefit from the capability. In short, they are building a community around the resource — a community of users, architects, administrators, and developers that will work together and support one another once the machine is launched.

Image of Bill Kramer

Bill Kramer, NCSA

“We have worked very hard to build a community to recruit people to try things at a scale that has never been tried before,” says Kramer. “Our goal is that on the first day the machine becomes available, applications are ready to go.”

A common model for machine deployment is that users, who largely are not funded to develop software for machines that aren’t yet available, wait until they have a reason to begin working on the new architecture; for example, the new machine is available and the old machine will be turned off in three months. But with the Blue Waters project, the NSF funded the user community under the Petascale Computing Resource Allocations (PRAC) program to travel and to participate in design and learning activities related to Blue Waters in exchange for significant allocations on the machine once it goes live. There are currently 18 awards involving about 80 researchers at 32 institutions. Kramer explains that each team has one to five codes, and funding is on average $40,000 per project. The funding helps make early participation possible for academic teams, but Kramer says that what really motivates them to participate is the promise of time on the new machine, and the opportunity to be a part of creating the next generation supercomputer.

New money, old costs

Image of Andrew Jones

Andy Jones, NAG

If you do the math on the PRAC investment, it comes up just short of $1M; and there is another round of funded projects coming, creating a significant total investment. But this money is paying for a small part of an expense that is (substantially) always present when new machines are fielded: the research teams that intend to use the resource have to learn about it and invest the time to tune their algorithms for the unique features of the platform. This is what Andrew Jones, vice-president of HPC Consulting Services at NAG, calls the cost of science. “What the NSF has done with the Blue Waters project,” he explains, “ take into account the cost of science, not just the cost of supercomputer ownership.”

Turning the design process around

Bill Kramer also identifies another key element of the Blue Waters project that is playing a large part in other exascale efforts (such as those being undertaken in the US Department of Energy): co-design. The principle is that hardware and software designers work side-by-side with people representing the science applications throughout the design of the system. This ensures that each team has the opportunity to influence the decisions of the other as much as possible before either has made a significant investment in a particular approach. This is in stark contrast to the more traditional approach in which hardware designers go into a locked room and bring out a finished system five years later for everyone else to start figuring out how to use.

But does co-design actually provide real benefits in practice? Kramer says yes. He described an exchange that took place mid-way through the project during which a particular feature was identified in the original interconnect of Blue Waters that was going to substantially limit performance for user applications. IBM took this feedback from the co-design interaction sessions and re-designed the communications chip to eliminate the bottlenecks. Simulations of the changed hardware indicate that performance is substantially improved for the impacted class of applications.

The new normal?

But, all of this is a lot of work. Are system providers going to be faced with these substantially more complex projects from now on? Is there something about the scale of these machines that will always demand this kind of added effort? “The work we are doing here, and the way in which we are doing it, is similar to previous efforts I’ve been involved in where there has been a lot of innovation,” says Kramer. “Certainly future efforts that have similar levels of innovation will need to take the same steps, but it isn’t necessarily the case that all large scale deployments in the future will need to have all of these elements in place.”

Andy Jones echoes that sentiment, “The Blue Waters project in particular has had a lot of lead time, and funding, to get the user community ready for this new machine. The co-design process they are using for this first-off machine gives them a unique opportunity with both the time and the incentive to build a community that groups deploying more conventional technology don’t usually have.”

“The traditional acquisition approach, in which programs will spend a year or more selecting a machine and then want to deploy it as quickly as possible, makes it very difficult to build a community and get codes ready ahead of time,” says Jones. The problem, he explains, is that very few investments are made in software or in using the systems. “The software must be viewed as part of the scientific instrument, in this case a supercomputer, that needs its own investment. High performance computing is really about the software; whatever hardware you are using is just an accelerator system.”

Jones draws upon the experience of the scientific community for inspiration. “If you look at CERN, or the KSA telescope, they have a 10-15 year planning effort with another 10-15 years of operations and upgrade. All published ahead of time so that the community can plan, prepare, and make the investments that will enable them to make the most use of the equipment that will be available to them. In HPC a machine is deployed and then obsolete within three years. And the users often have no idea what architecture is coming next. There is no real chance for planning, or a return on software development investment.”

Exascale: pushing all the buttons at the same time

Looking past present-day deployments in the trans-petaFLOPS regime, the HPC community is in the early stages of the shift to exascale computing within a constrained power envelope (a 20MW system is the generally agreed upon target, and this constraint is what motivates much of the innovation being discussed in supercomputing system design today). And while the level of innovation required to build a practical exascale system this decade is astonishing, the timescales on which the changes must take place are terrifying. If you don’t have enough to keep you awake at night, read the innovation timelines in the roadmap recently published by the International Exascale Software Project.

Image of Rick Stevens

Rick Stevens, ANL

Rick Stevens of Argonne National Laboratory has been part of the leadership of the US Department of Energy’s exascale efforts since the beginning, and he rattles off fundamental changes in the way everything from system memory (no more page access for DRAMS and no more generic memory as controllers are moved on chip) to processor interconnects will have to work. It is a truly daunting proposition, and he outlines a process for getting it done that includes co-design with applications developers along with new levels of cooperation and collaboration in common components and programming models as well as system software.

If you are thinking about building a community to support intense levels of innovation, and you don’t include the vendors who have to provide the products representing the innovation, you are not giving yourself much chance of success. Exascale deployments not only have to build technology and applications communities, they also have to build new kinds of community between providers and vendors. The DOE approach is a strong example of that (as is the partnership that the Blue Waters team has with IBM).

Moms, dads, and the places that supercomputers live

All of the complex, highly coordinated activities that are part and parcel of deploying extreme scale systems at this point in history have a visible impact on the communities, lives, and wallets of the places where these machines are installed.

“For a large deployment like Blue Waters,” observes Bill Bell, division director for Public Affairs at the National Center for Supercomputing Applications, “the community is spending money and dedicating resources to the establishment of the new system. In our local community both the state and the University of Illinois have invested considerable time and energy into this project. We wanted to make sure that our local community understood what was being done with that investment, and why it was worthwhile. At our Community Day event (highlighting our new National Petascale Computing Facility) we invited everyone in the local area to come by and let us show them what we were doing. We were expecting IT guys and techies; 1,000 people showed up. They had watched this enormous building going up, and they wanted to know what it was all about.”

At these kinds of local community events, Bell’s team emphasizes the benefits of the science done on NCSA’s machines. “We describe the unique history that NCSA and the University of Illinois have in HPC, and try to demonstrate the scientific relevance of HPC to their everyday lives.”

Ultimately much of our activities in HPC, especially at the extreme high end, are funded by taxpayers at all levels of government. The facilities we build take up space in places that might otherwise be used more directly by local residents for houses, parks, or shopping centers. All of this has an impact far beyond the exclusive community of professionals who build and use HPC. These issues are perhaps as far from the iron in a supercomputing deployment as one can get, but they are absolutely critical to a successful and sustained long-term deployment.

Building the highest of high performance computers has always been a complex technical challenge, with vigorous innovation catalyzing aggressive change and bringing with it uncertainty for both users and providers. This is a trend that is sure to persist. However, one thing is certain — as we continue to push the boundaries of what is possible in pursuit of a better understanding of the world around us, deploying extreme scale supercomputers involves far more than just the iron.