Will Exascale Drive an Unprecedented Level of Global Cooperation?

Print Friendly, PDF & Email

Or Will It Fan the Fires of Global Competition?

Numerous HPC visionaries have cautioned that exascale will require years of global cooperation. That’s right — years of cooperation from the very countries that compete for technological leadership. This could truly be the dawn of a new era. And it appears to be happening.

Right now working groups are assembling on a global scale to start researching different approaches that might work — from architectures and network interconnects, to storage and creative power solutions. There are no exascale experts today — only opinion leaders. So, while we really don’t know where the exascale journey will take us, we are clearly at a critical time for asking questions. The journey has started, and the train is about to leave the station.

Today, there is a major emphasis — backed by many millions of research dollars — on what is anticipated to be a multinational, decade-long development cycle.

According to the International Exascale Software Project (IESP) Roadmap 1.0, dated May 30, 2010, architecting “…the massive software foundation of Computational Science in order to meet the new realities of extreme-scale computing are simply too large for any one country, or small consortium of countries, to undertake all on its own.”

This refers primarily to the exascale software stack, or X-stack, a completely new, integrated collection of software developed specifically to support exascale-class systems.

Quite recently, Peter Kogge, retired IBM Fellow and McCourtney Chair in Computer Science and Engineering at the University of Notre Dame, made a number of interesting, thought provoking points in a presentation titled, “The Challenges of Exascale Computing for Astroinformatics Apps.” Based on an 18-month DARPA-sponsored study on exascale computing, Kogge’s presentation states that not only will the development of exascale systems be “really tough” with “evolutionary progression” coming in the 2020 timeframe — but he also uses eye-catching graphics to emphasize the point that development of exascale systems would require, not just effort, but “miracles.”

Our readers will notice that in several of the articles appearing in this first issue of The Exascale Report, “global cooperation” and “co-design” have become common themes among community thought leaders as many individuals and organizations step up to comment on the need for coordinated global cooperation in the quest to develop exascale-level systems. The scope of the challenge is daunting, and the argument for cooperation convincing.

In speaking with a number of community leaders while researching this article, one thing became quite clear: This new “Grand Challenge” surpasses any technological challenge we have faced in the past.

Tickets Please

Around the world, members of the global HPC ecosystem are just starting to board the exascale train for a long journey — an anticipated eight to 10 year trek. Those who are left at the station will not only miss out on one heck of a ride, they may never make it to the final destination.

To use a different metaphor, what we are engaged in is the race to exascale. Running a sprint in HPC is commonplace. Quarterly and yearly performance milestones and the Top 500 bragging rights have conditioned us to expect technology and milestone leapfrogging. But this race is different. This is a marathon on a course that no one has run before. If any of you think you can bide your time for now and catch up later with a record-breaking sprint, I wish you luck.

Putting Things in Perspective

Those of you who entered the HPC community sometime during the past 10-15 years won’t remember the unbridled passion and excitement that marked the entry into the teraFLOPS era.

But, back in the early 90s, before we had even achieved a working teraFLOPS system, we were discussing petaFLOPS. From the first rallying cry to the accomplishment of the first petaFLOPS system took about 12 to15 years. And, although we managed to get there, we are all painfully aware that millions of lines of code are being held together with duct tape and super glue. The issue of software not keeping up with hardware has been the topic of much discussion at conferences for the past several years. Today, the lack of optimized software for petaFLOPS machines is causing much concern.

Moving forward it gets worse. The challenge we face in the quest for exaFLOPS is far more significant and much more difficult than the evolution from teraFLOPS to petaFLOPS. Duct tape and superglue are no longer options. This next journey requires major innovation in system architecture and system software to build the megasystems that will run an entirely new generation of applications, which also need to be developed along the way.

Toward Global Cooperation

To better understand this need for global cooperation, The Exascale Report spoke with HPC pioneer, Jack Dongarra. Dongarra, along with Pete Beckman, Division Director of the Argonne Leadership Computing Facility, is a co-founder and co-principal investigator for the International Exascale Software Project (IESP)

Dongarra certainly has the credentials to speak on this topic from an established and well-respected leadership position. He is well known for his role with the Top 500. But what many people in the community don’t know is that he has contributed to the design and implementation of numerous open source software packages and tools in use today, including ATLAS, BLAS, EISPACK, LAPACK, LINPACK, MPI, Netlib, NetSolve, PAPI, PVM, and ScaLAPACK.

According to Dongarra, “This (the quest for exascale) is much bigger than just growing the computer hardware. An entire computing ecosystem must be developed.”

Dongarra sees the challenge as one of balance. “Achieving exascale-level computation will require a balanced investment,” he says. “There are many elements that need to come together and the talent we need to do this comes from countries all around the world.”

He also cautions that real progress toward achieving an exascale system shouldn’t be falsely measured simply by hardware benchmark performance. As an example, he singles out the number two system on the current Top 500, the Dawning Nebulae system in China. “As I see it, this system reflects a massive effort to get a high entry in the Top 500, but this is not an example of a system with a rich ecosystem. It has fundamental issues — it lacks balance.”

In case you missed it, based on the Linpack benchmark, China’s HPC pride and joy, Nebulae, holds the number two slot on the Top 500 list of the world’s most powerful supercomputers. But in fact, Nebulae is actually the fastest system worldwide with a theoretical peak performance of 2.98 petaFLOPS per second.

What this accomplishment does reflect, according to an anonymous source in China who has asked to remain anonymous (so we’ll refer to him as Mr. Zheng), is “the highly competitive nature of the Chinese people, driven by a great sense of national pride. Nebulae is a proof point. It demonstrates China’s ability, and to a certain degree our desire to be recognized as a global technology leader. Fortunately, we have the resources to collaborate and compete aggressively in this race.”

But, according to Dongarra, “The lack of investment in the ecosystem for Nebulae becomes apparent quickly when we look for some critical apps running on the system. And this is exactly my point about the need for a balanced investment. Getting to exascale will require far more than just bigger and faster machines. This is all about science. Science is the driver — not benchmarks and not the technology.”

Global cooperation as discussed in the IESP roadmap seems to be somewhat understated. In fact, the challenge of achieving global cooperation is perhaps even bigger than the technical software challenge. When we look at the competitive HPC landscape, it’s hard to imagine long-term cooperation bringing together the U.S., the EU, China, Japan, Korea, and Russia — all working toward a common goal. But in fact, it’s more likely that these nations will participate in cooperative exascale development through the IESP because the risk of not participating presents a far bigger threat to their competitive positions.

In this respect, the IESP holds a key role in not only fostering global cooperation, but also in developing the heart of exascale-level computation — the software stack.

Says Dongarra, “The IESP roadmap will be continually updated. Its not so much that the participants will have a competitive advantage — it’s more accurate to say those countries who don’t participate will be at a disadvantage. They won’t have any input on the development of the software stack and they won’t have the early insight into possible new standards that will emerge as a result of this activity.”

He explains, “The goal of the IESP is to come up with an international plan for developing the next generation of open source software for high performance, scientific computing. So, we will develop a roadmap, one that will detail issues and priorities. and describe the software stack that’s necessary for exascale.”

According to Dongarra, the software stack will include various elements from the system side, such as the operating systems, I/O, the external environment, and system management. “It also deals with the development environment, which looks at programming models, frameworks for developing applications, compilers, numerical libraries, and debugging tools. Then there’s another element that tries to integrate applications and use them as a vehicle for testing the ideas.”

Dongarra goes on to define the final piece of the software stack, what he calls ‘cross-cutting issues’ — issues that really impact all of the software, such as resilience, power management, performance optimization, and overall programmability.

Most Desirable Approach

The Exascale Report also spoke with Paul Messina, a member of the IESP Executive Committee. According to Messina, “An open, global community-based effort is the most desirable approach to developing the software environment for exascale systems. The IESP workshops have demonstrated that there are people in many countries who are interested in working together on exascale software, and who have relevant expertise and experience.”

Messina believes the playing field is equal right now — no one country is in a better position to field the first exascale system. He says, “The global HPC community will benefit from the availability of the software, and while the U.S. is likely to invest significantly in the development of both exascale-class hardware and software to prepare key applications to take advantage of these systems, other countries can certainly do the same. I do believe that global cooperation on software for exascale systems is both feasible and likely.”

Bill Camp, Intel’s Chief Supercomputing Architect, who is also respectfully referred to as “Mr. Exascale” around Intel, is optimistic about the prospect of global cooperation. Camp believes the various countries and even the vendor organizations must “cooperate to compete.”

He says emphatically, “Frankly, the cost of playing is so high, no one country can shoulder this burden on their own. Achieving exascale will require significant government funding, ongoing investment from industry, and a long-term commitment from all levels of the ecosystem.”

Camp continues, “We’re not talking about hundreds of millions of dollars, we’re talking about billions of dollars. Some of that will be borne by companies, but an awful lot of it will be spread around multiple governments. People will participate in a cooperative fashion because of the potential global impact. And, for many of the industrial organizations, there is potential to make a lot of money by being first to market.”

The cost involved and the level of difficulty in bringing together all the different pieces of a functional exascale system mandates multinational, global cooperation. Funding sources must understand the complexity and scope of this challenge in order to keep their purse strings open for the next eight to 10 years. And that is a major challenge.

Camp’s perspective is interesting: “I’ve been somewhat of a skeptic when people use hype to talk about how ‘hard things are.’ Well, in this case, exascale really is different. Part of the problem we face is that people have been hearing us cry wolf for so long that I’m not sure they believe us when we say how hard this is going to be. This is not just changing an architecture. We are running out of the CMOS roadmap. Even as we shrink CMOS very aggressively, we will still be challenged by the difficulty of maintaining exponential growth in computer capability without requiring a nuclear reactor to power the computer. This is something we’ve never had to face before.”

Of all the technologists we spoke with, very few are concerned about being able to achieve exascale from the standpoint of technology. In fact, to many of them, the realization of exascale-class systems by the end of the decade is inevitable. But those who are concerned raise at least one very good point. One source from a government agency in the U.K. who wishes to remain anonymous, commented, “From a technology standpoint, we can get to exascale within ten years. Of this, I have no doubt. However, from a political support, government funding, and international collaboration standpoint, I seriously doubt that we’ll ever make it.”

The development cycle for achieving exascale will transcend political elections, budgets and commitment changes at many of the funding agencies, and even possible commitment changes from the vendor community — the companies who will drive the base hardware development. When politics and budget reform is in play, there are no certainties.

Looking Down the Track

Dongarra summarizes the goal of the IESP this way: “Today, we don’t really have a global evaluation of missing components within the stack itself. We want to make sure that we understand what the needs are and that the research will cover those needs. So we’re trying to define and develop the priorities to help with this planning process. Ultimately, we feel the scale of investments is such that we really need international input on the requirements. With Americans, Europeans, and Asians working together, we want to develop this larger vision for high performance computing — something that hasn’t been done in the past.”

If we accept the assumption that the hardware will come together, the three biggest non-technology challenges are:

  • Global cooperation on the software development
  • Multinational government funding
  • Steady, ongoing commitment from all parties

Global cooperation has already started. The IESP is off and running and has fairly good representation on a global scale. The U.S. and the European Union are collaborating closely and appear to be stepping up their game. Camp told The Exascale Report, “I expect that we will see more cooperation between the EU and the U.S. on the development of exascale than we have seen between those two blocks with any prior computational initiative.”

The big question is will politicians over the next ten years help to drive the effort or get in the way? Will China, Japan, Korea, and Russia participate in this global effort for the long haul with coordinated programs and government funding, or move forward on their own?

Competitiveness requires a national strategy. Exascale requires a global strategy. So, is global cooperation possible? Of course. At least in the short term. What we need to ask is whether or not global cooperation can be sustained for the next eight to 10 years. The answer to that is a definitive “maybe.” If any government funding agency takes its eyes off the ball or drops support due to budget changes or political reasons, the exascale train could definitely get derailed.

In future issues, The Exascale Report will be covering the exascale development cycle from various angles, including global cooperation vs. multinational competitive posturing, global commitments, political roadblocks, and the steady evolution of the exascale computing ecosystem.

Let’s close with a quote from our friend, Mr. Zheng in China. “We need exascale to drive scientific research. We need computational power at the exa level in order to better understand and protect this planet. And for me, personally, I think it would be great if the first exascale computer had a very large engraved tag that said, ‘Made in China’.”

A Quick FAQ on FLOPS

A teraFLOPS, in very general terms, is the equivalent of one trillion calculations per second. To put things in perspective, an exascale system would have the processing capability of one million teraFLOPS per second, or one quintillion calculations per second.

Have you digested that one? I know — I have the same problem.

How about this? A petaFLOPS is 1000 times faster than a teraFLOPS, and an exaFLOPS is 1000 times faster than a petaFLOPS.

If your son or daughter was sketching this out on “Who wants to be smarter than an any grader”, it would look like this:

Prefix Value
Kilo 1,000
Mega 1,000,000
Giga 1,000,000,000
Tera 1,000,000,000,000
Peta 1,000,000,000,000,000
Exa 1,000,000,000,000,000,000

Jack Dongarra

University Distinguished Professor
University of Tennessee
Department of Electrical Engineering and Computer Science

Director, Innovative Computing Laboratory
Director, Center of Information Technology Research

Distinguished Research Staff
Oak Ridge National Laboratory
Computer Science and Mathematics Division

Turing Fellow
University of Manchester
School of Mathematics
School of Computer Science

Paul C. Messina

Member of the Executive Committee, IESP
Director of Science, Argonne Leadership Computing Facility

Past experience:

  • Director of the California Institute of Technology’s (Caltech) Center for Advanced Computing Research and the principal investigator for TeraGrid.
  • Director, Office of Advanced Simulation and Computing, Defense Programs, National Nuclear Security Administration, Department of Energy. (ASCI program)
  • Founder and Executive Director of the Concurrent Supercomputing Consortium.
  • Advisor to the director general at CERN (European Organization for Nuclear Research).
  • Federal Computer Week’s “Federal 100 Award” for spearheading the acquisition of the Intel Touchstone Delta System and creating the Concurrent Supercomputing Consortium.