In this special guest feature, Earl Joseph from IDC describes his SC15 panel where four HPC luminaries discussed, disputed, and divined the path to exascale computing.
Many experts say that a new high-performance computing (HPC) architecture is required to achieve exascale computing and beyond, meaning that everything has to change. Is this an exaggeration?
During a recent spirited panel discussion focused on the topic Directions in HPC Architecture: Everything Changes*, four well-known and highly respected HPC luminaries weighed in on the topic. Bronis de Supinski, CTO of Livermore Computing of Lawrence Livermore National Laboratory, Intel Fellows Mark Seager and Al Gara, and Thomas Sterling of Indiana University (CREST) and the highly regarded father of Beowulf, shared ideas about the future direction of HPC architecture. I was thrilled to moderate this entertaining and informative panel discussion and offer this summary.
How much has to change?
The consensus of the panel was that getting to exascale will be an evolutionary approach with revolutionary technologies, rather than a straight-on revolutionary journey. But that’s about the only consensus there was. This group of experts comes at HPC from different perspectives—commercial, academic, and research—as well as different backgrounds and architecture preferences.
Mark Seager kicked off the discussion by hearkening back to the terascale era, when he reminded us that the very same debates occurred. Would we need a whole new architecture? How would we ever manage the power? The level of parallelism would be impossible to code for. And such machines would only stay up for 12 nanoseconds before crashing.
They were all wrong!” Seager says. “Not just a little wrong but completely wrong. We’re making all the same predictions for exascale. Yes, it will be harder, because we’re trying to get to a million times terascale. But at Intel, we think we can get there through a combination of revolutionary and evolutionary approaches.”
He went on to say that it’s critical to embrace a transitional versus a disruptive architecture to bring along the substantial existing application base that runs on current HPC systems. Intel envisions exascale-class systems that can run existing applications while delivering a significant performance boost, with only incremental changes to the applications required. However, these systems would also offer enough exciting new capability to enable a whole new class of applications to take advantage of emerging, revolutionary technologies.
Legendary system architect, Al Gara reinforced this. “We don’t want to break it unless we know we have to break it,” he said of architectural opportunities. “I see many ways that we can get much better energy efficiency without having to throw out everything we currently have.” The Intel Fellows pointed out that even before we worry about exascale, there is still a great deal of inefficiency in current terascale and petascale systems that can be tightened up to yield big performance boosts. Seager noted that some petascale systems run applications at 1 percent efficiency—pretty abysmal. Imagine if we could increase their efficiency by up to 10 percent, for a 10X performance improvement for virtually no extra power. So making current program models more efficient is an important first step on the way to exascale.
de Supinski stepped in as contrarian and said that while he hated to agree with his Intel colleagues, the whole question was an exaggeration. “In one sense, nothing is going to change,” he said. “Our systems will still have processors, memory, and networks, so they’re not going to look that incredibly different. I would say the only real open question in terms of what we build systems from is, will we still have spinning disks? But otherwise, they won’t be so radically different that we won’t know how to use them.”
The father of Beowulf took de Supinski to task and claimed, with intentional humor, that his youth prevented him from understanding the concept of change. “For those of us who started out punching cards in the 1960s, execution pipelines and instruction-level parallelism was a huge breakthrough,” Sterling said. “In the 1970s it was vectors, which were pretty big. In the 1980s, there were commercial SIMD-level machines. In the 1990s it was multiprocessors and Beowulf clusters. In the 2000s, we had multicores. And finally, in the 2010s, we find anything you can attach to the wrong end of a PCI bus, including accelerators. So when someone tells me that nothing changes, I disagree. Change is continuous though tempered in pace.”
Sterling went on to point out that change needn’t mean throwing things away. Change can be accomplished through an additive process. And even when change at the architectural level occurs, it may not even be visible to the user. Sterling believes that these additive changes will be in the areas of global address space, global address translation, rapid lightweight task creation, context switching for scalability, and efficient message-driven communication as a way to address efficient data movement through complex, highly parallel systems.
One of the exciting new architectural changes, Sterling added, might be the elimination of classical cores. While Sterling is well known for throwing an occasional curve ball or challenging the status quo, no one surprisingly jumped on this position. One might assume this was a position they either agreed to or chose not to dispute. From the IDC perspective, we are hoping that some fundamental new approaches are developed and used, and replacing classical cores with something very different could be one of these changes.
Another innovation that may be a part of exascale architectures is transistor specialization—designing and targeting some of a chip’s millions of transistors for specialized algorithms and functions. An exciting possibility indeed.
Biggest barriers on path to exascale
I next asked my scrappy guests what they saw as the most challenging architectural problems standing in the way of reaching exascale.
de Supinski said the most challenging exascale problem is going to be data movement through a complex memory hierarchy. “We’ve already seen the emergence of innovations such as high-bandwidth memory, which provides the ability to keep the memory bandwidths closer to the CPUs and GPUs using them, but those memory schemes are very expensive,” de Supinski said. “We’re seeing the emergence of an even richer set of memory technologies, with speeds and feeds increasing, but managing that ultra high-speed memory is going to be a challenge.”
Seager claimed that power was the biggest issue—moving data around efficiently, with less energy usage. Seager pointed out that in the last 20 years, Linpack has progressed at more than twice the speed of Moore’s Law in terms of the compound annual performance growth rate, and we got there one way: through brute force. “It was called MPP, massively parallel processing—packing in more cores, more racks, more power,” he said. “This level of physical resource multiplication is simply not sustainable for another 20 years. It would require an increase of 104 in power.”
Sterling disagreed with both de Supinski and Seager. He claimed that the biggest problem is not power but managing the incredible levels of parallelism that exascale computing will bring. He pointed out that even in a single-core processor of today, the amount of parallelism is enormous—in the order of thousands of execution threads. With exascale computing, we’ll see billion-way parallelism.
“The HPC community faces a big challenge in optimizing code for this level of parallelism,” Sterling said. Challenges include resource starvation due to improperly written scheduling algorithms, latency due to the inability to efficiently feed data to all threads, system overhead, which imposes bounds on how effective a massively parallel system can be, and contention—multiple requesters asking for a shared resource. This is, of course, a topic Intel knows quite well with its Modern Code initiative.
Reliability problems can be tamed
An audience member asked if there were assumptions we hold today about exascale computing that will probably be proven wrong 20 years from now. Seager claimed it will be the worries over reliability. He thinks we’ll find good methodologies for dealing with the FIT (failure in thousands) rate as we scale into billions of cores. He said in fact that Intel is already hammering away at FIT rates with current processor architectures by integrating ever more functionality on the processor and eliminating extra chips and memory. Improving reliability is largely a matter of time and money, panelists agreed; it’s not a fundamental problem that we don’t know how to solve.
Sterling, however, pointed out that reliability problems at the exascale level will be largely software-related, not hardware-related. Further, he said that past a certain level of scaling, software bugs are largely indistinguishable from hardware failures. “The complexity and chaos of computation is so extreme at the exascale level that we can apply the same mechanisms of software debugging to hardware reliability,” Sterling said. The industry will need to develop a rigorous discipline for debugging exascale-caliber software.
Will exascale systems be usable?
Finally I asked, how can users ever be expected to take advantage of systems that have millions or billions of cores? A new hardware model that is nearly impossible to use from a software perspective is not very useful, and would fail to take advantage of exascale performance possibilities.
Sterling suggested that the industry needs to adopt a global parallel execution model to replace the standard message-passing model. This will provide a discipline, or paradigm, by which application developers can consider all the layers—hardware, software, application programming algorithms, and operating system—and define how they fit together. This will greatly simplify usability.
Part of the goal is to separate the physical cores from the logical parallel action so that a user thinks only about the logical action. This is not a new idea and is being implemented in a number of ways, Sterling said, but it requires dynamic mapping and adaptive scheduling. Usage of advanced and sophisticated runtime software will unburden the user and exploit dynamic runtime state information, which a compiler can never predict. Therefore, it simplifies user productivity and performance portability. By reducing overhead, hiding latency, and increasing the parallelism opportunity, we simplify rather than make more complex the user demand.
de Supinski shot back with a warning that, with all the dynamic moving of computation around in such an adaptive system, you risk flooding the system with a huge amount of overhead. “The real key to designing exascale-era applications will be developing programming models that better describe the programmer’s intentions and the requirements of the application,” de Supinski said. “The compiler needs to be provided with the information it needs to match the code that’s being generated to the needs of the application.”
One other concern de Supinski has about dynamic “tasking everywhere” programming models is that they often don’t do a good job of reflecting the underlying architecture, which he said is critical.
As the panel wound to a close, participants agreed on one thing: the path to exascale contains significant obstacles, but they’re not insurmountable. Tremendous progress is being made in preparing codes for the next generations of systems, and sheer determination and innovation is running at an all-time high.
It’s clear that Exascale computing leadership will require substantial funding and in many cases users will need to convince their funders that the ROI in both economic and scientific terms that comes with exascale leadership, is worth the large investments required.
View the entire panel discussion recorded in the Intel Community Hub at SC15