Initiatives are being launched, research centers are being established, teams are being formed, but in reality, we are barely getting started with exascale research. Opinions vary as to where we should be focusing our resources.
The Exascale Report Asks
Where should we (as a global community) be placing our efforts today with exascale research and development?
As application developers, we only really know two things about (early) exascale supercomputers at the moment. First, roughly billion–way concurrency will be required to exploit them. Second, data movement (bandwidth and latency) at all levels of the memory/interconnect hierarchy will be considerably more constrained relative to today’s supercomputers.
We might speculate further as to the details of the exascale system (processor types/features, heterogeneity, resilience, etc.), or which software technologies, languages, etc. will emerge as winners. But, if we are to have applications ready to run on the first exascale systems doing science from day one, then many of the application design choices (language, method(s) of parallelism, algorithms, etc.) will have to be made before we know any details for sure.
We are still early enough in the exascale roadmap to truly influence system design choices – but only if we know enough about our applications and science needs to give clear advice to hardware and software technology architects.
The global community (especially on the applications development and user side) must focus efforts today on pragmatic rather than elegant exploitation of early exascale computers. That is, find the most economical path to achieve useful science as soon as possible. This might mean starting afresh with new codes instead of evolving establish ones. Or it might mean making the minimal set of evolutionary steps to an existing code to get good enough (not ideal) performance at exascale compared to petascale. Performance should be measured in application output, not in artificial metrics like FLOPS or percentage of peak performance.
Why my insistence on pragmatic and early? Two reasons. First, early high profile science output will be essential to help justify the (public) funding invested to that point and further funding for the future development plans. Second, early insights from real world applications on exascale computers will be required to properly inform the development of software technologies and system design as the exascale era continues.
Hopefully then the two of these (continued funding and properly informed technology development) will ensure future generations of exascale supercomputers are broadly useful and economically viable, and so can deliver the benefits to science, the economy and society that we believe they can.
When it comes to exascale we have a choice. We can try to build exascale systems using existing mainstream hardware and software architectures, or we can do it with disruptive ones. Existing architectures are convenient because it’s the technology we’re used to. Building truly disruptive systems will require more bravery and innovation. Given the challenges of exascale, we may have to be disruptive. We may have little choice.
As we look to the future, supercomputing is at a crossroads. Refining and scaling up current technology cannot develop the next generation of exascale supercomputers. We need a national R&D program to develop new technical solutions to maintain our continued leadership in HPC and provide exascale computing to enable success in the development of new energy technologies, scientific advancements and discovery, and in U.S national security.
The traditional exponential clock rate growth that has given us our 2x performance improvement every 18-24 months over the last 15 years has ended. Now instead of increasing the clock frequency on a chip, we are doubling the # of cores on a chip. Multi-core computer systems come in many varieties each with their own challenges regarding how to utilize the order million or potentially billion cores all on a single problem, let alone get the operating system, performance tools, debuggers, etc to work across that much parallelism. An additional problem to that level of parallelism is reliability and resiliency. And as we gang more and more cores together the electricity required to power the system and its associated memory becomes staggering. In fact it becomes the overarching technology challenge to solve for reaching exascale.
To overcome the technical challenges, we will need to reduce 5 to 10x the electricity required to power supercomputers. Computers at this scale will need “self-awareness” to overcome the failures inevitable in machines with millions or billions of components. We need to develop new technical solutions to increase memory and bandwidth to support increased processor speeds, increase reliability in operating systems and components, and develop new algorithms. Exascale “exemplar” applications must also be developed as part of a co-design process to assure R&D is directed at critical US endeavors. Success will require a strong collaboration between industry, DOE laboratories, and academia.
We are in a global competition with determined rivals. The proposed US Exascale Computing Initiative will be an investment in American competitiveness and in retaining global leadership in the supercomputing field, with potentially huge benefits in energy, environment, health, and national security applications, and large spin-off impacts in computer systems, information technology, and commerce to the benefit of businesses and consumers alike.
For related stories, visit The Exascale Report Archives.