ISC 2024’s Wednesday Keynote: Two Trends Transforming HPC

Print Friendly, PDF & Email

Wednesday night’s keynote session at the ISC 2024 Conference in Hamburg takes on the future of HPC with deep dives into two themes: the rise of specialized architectures and new application workflow capabilities that reduce the terrors and tedium of workload implementation. While we all know that if there’s one thing people can’t predict it’s the future, the two futristic themes may turn out to be true because both are already in progress.

Moderated by well-known HPC community veteran Horst Simon (formerly CRO at Lawrence Berkeley National Laboratory, now director of ADIA Lab in the United Arab Emirates), the workflow topic will be discussed by Rosa Badia, manager of the Workflows and Distributed Computing Research Group at the Barcelona Supercomputing Center. The specialized architecture topic will be handled by John Shalf, CTO for the National Energy Research Supercomputing Center and department head for Computer Science and Data Sciences at Berkeley Lab (and last year’s ISC 2023 program chair)

Wednesday, May 15, 5:45-6:30 pm CET
Congress Center, Hall Z, Third Floor

Overarching both themes is the slowing of Moore’s Law, the deceleration of improvement in HPC performance combined with increasing costs for new generations of systems. In addition, implementing and realizing the benefits of new systems is slowing as supercomputers grow in size and complexity, including those handling AI, AI for science, traditional HPC simulation and modeling, and data analytics workloads.

This is where the notion of “HPC workflow-as-a-service” (HPCWaaS)and Badia enter the picture. It’s the idea of taking containers and orchestration to a higher level with the express purpose of making supercomputing power more accessible and more customized for specific workload requirements.

Rosa Badia, Barcelona Supercomputing Center

“There is this need for complex workflows in HPC that combines traditional HPC applications, like simulation and modelling, the typical MPI applications,” she told us, “but the user has the need to combine these with new tools, new parts of the code, that comes from AI or big data. And these, all together in a single workflow, it’s better if it can be orchestrated as a single application.”

She emphasized that for many HPC users, getting applications up and running is increasingly difficult.

“For non-experts, it’s not easy, it’s an environment that sometimes is not friendly,” she said. “So we’ve been developing methodologies to make it easier for not only the development but also the deployment and execution of these complex and unique workloads. We have called these HPC workflow-as-a-service in such a way that we provide this methodology, through containers, that can deploy the applications. An objective is to enlarge the HPC community, widen the HPC infrastructure to more users who are less expert at systems management.”

John Shalf, NERSC and Berkeley Lab

HPCWaaS is the focus of the EuroHPC initiative’s eFlows4HPC project. It initially targets three application areas: advanced manufacturing, climate studies and the prediction of catastrophic natural events, such as earthquakes and tsunamis. Each application and workflow present unique requirements that would be difficult to address with a monolithic workflow approach.

Moving on to Shalf’s theme, he told us the HPC community is moving toward “a crisis” in which critically important scientific challenges, such as climate change that require massive compute and memory, are overwhelming the capabilities of HPC systems to keep pace with the need.

“I’ve got a plot showing we went from, on average, a 1000X improvement in the performance of HPC systems for each system generation two decades ago, now we’re down to only a 2x performance improvement every decade.”

The pressure for faster systems is creating new approaches to chip building characterized by Shalf as cost effective, that are being pursued by different parts of the microelectronics industry and that “could get us back on track”. He’s talking about chiplets – specifically, specialized chiplets configured with a mix-and-match approach to meet the applications’ unique priorities.

For demanding HPC workloads, this means moving away from general-purpose HPC.

“One of the go-to tricks for continuing to extract performance out of electronic systems is to tailor them for the computational problem they’re solving,” Shalf said.

Thirty years ago, Cray delivered custom vector machines, “but we know we can’t afford to develop systems like that anymore, that approach requires too many resources, too much up-front investment.”

Interest in the chiplets ecosystem is driven by the hyperscale data centers such as Amazon and Google, to develop custom architectures, chiplets that have different, partitioned functionality. “So rather than us having to design everything in the system, you can actually try to insert specialized functionality into this chiplets ecosystem to get specialization at a reasonable cost.”

Shalf cited AMD as an early-adopter that has developed an internal ecosystem of chiplets enabling “them to only spin one generation of chiplets but then create lots of different variants of their design just by rearranging the chiplets, rather than having to have a separate chip for each one.”

From the vendor side this is a key development because the mass cost of a modern fab are extraordinarily expensive. He also said if a chip company developed a radical, limited designs in the form of massive GPU chips with a single piece of silicon, “you’ll have to cost out 90 percent of the chip, which also is very expensive.”

Chiplets address a number of problems. “It solves a modularity problem, so you can get lots of different variants of your machine targeting different markets. It also solves this cost yield curve problem, and it solves the problem cost of generating (a lithography)  mask to fabricate a new chip,” he said.

The next step in chiplet: “opening up that ecosystem so that you can combine parts together from multiple players to develop increasingly customized systems to target your workload.”

“There’s a strong market pull, it’s the customers that are demanding this, they’d like to be able to mix and match for multiple vendors so you don’t have to be trapped inside of a single company’s ecosystem,” said Shalf. “That’s pushing open standards like UCIe (Universal Chiplet Interconnect Express) and other standards to enable third parties to all play together and develop their own products without having a single vendor controlling everything. That’s the direction things are going, but we’re not there yet. We might not even be there in a decade. But that’s the direction that the industry is pushing towards.”

Shalf said much of this demand is coming from the hyperscalers “ bending the market to their will, they’re really pushing this chiplets concept.”

And the leading hyperscalers, which constitute markets in their own right, usually get what they want.

Speak Your Mind