DOE Document Reveals Next-Gen Supercomputing Strategy: A Move to More Modular, Faster Upgrade Cycles

Less than a month after its Frontier system broke the exascale performance barrier and won the no. 1 supercomputing world ranking, the U.S. Department of Energy today issued an RFI revealing its strategic thinking for the next generation of leadership-class supercomputers extending out to 2030. The document calls for “the development of an approach that moves away from monolithic acquisitions toward a model for enabling more rapid upgrade cycles of deployed systems, to enable faster innovation on hardware and software.”

DOE said it expects its next-gen systems “to operate within a power envelope of 20-60 MW.” Other than that the document, entitled “Advanced Computing Ecosystems Request for Information,” is general in nature and does not specify the number of systems it plans to acquire, the funding required to build them or precise performance expectations. Nor is it adamant on how to accomplish its objectives.

“One possible strategy would include increased reuse of existing infrastructure so that the upgrades are modular,” DOE said. “A goal would be to reimagine systems architecture and an efficient acquisition process that allows continuous injection of technological advances to a facility (e.g., every 12–24 months rather than every 4–5 years).”

That said, the agency is open to pushback:

“Understanding the tradeoffs of these approaches is one goal of this RFI, and we invite responses to include perceived benefits and/or disadvantages of this modular upgrade approach.”

The RFI in other ways does not represent a major departure from DOE’s next-gen system planning after Summit, stood up in 2018 and succeeded by Frontier, installed at Oak Ridge National Laboratory last year.

For example, DOE states it intends to “plan, design, commission, and acquire the next generation of supercomputing systems in the 2025 to 2030 timeframe,” and that the agency “is interested in the deployment of one or more supercomputers that can solve scientific problems 5 to 10 times faster—or solve more complex problems, such as those with more physics or requirements for higher fidelity—than the current state-of-the-art systems.”

Which is to say the delivery timeframe and performance advancement is roughly within the parameters of the Summit-to-Frontier plan.

But the call for more agility and modularity is something new and seems to be centered on DOE’s notion of an Advanced Computing Ecosystem (ACE), which in turn points to the broadening workload scope encompassed by current HPC/AI technologies and systems, their greater heterogeneity with a premium placed on more resource sharing. This includes the desire for “A capable software stack (that) will meet the requirements of a broad spectrum of applications and workloads, including large-scale computational science campaigns in modeling and simulation, machine intelligence, and integrated data analysis.”

“Future DOE supercomputers will need to tackle scientific discovery challenges against a backdrop of emerging edge computing technology, data science, and machine learning advances, in addition to traditional modeling and simulation application requirements,” the department stated. “DOE also is planning for and designing an… (ACE) for this timeframe that will enable integration with other DOE facilities, including light source, data, materials science and advanced manufacturing. The next generation of supercomputers will need to be capable of being integrated into an ACE environment that supports automated workflows, combining one or more of these facilities to reduce the time from experiment and observation to scientific insight.”