Sandia: Molecular Dynamics Simulation Record Breakers Nominated for Gordon Bell Prize

Two views of a grain boundary in tungsten. (Source: research paper “Breaking the Molecular Dynamics Timescale Barrier Using a Wafer-Scale System“)

Sandia National Laboratories announced today a new speed record in molecular dynamics simulation.

A collaborative research team ran simulations using the Cerebras Wafer Scale Engine (WSE) processor and “raced past the maximum speed achievable on the world’s fastest supercomputer by an unprecedented 457 times, where speed is measured in simulation timesteps-per-second,” Sandia announced.

The achievement is a finalist for the Gordon Bell Prize and ” could herald a new dawn in molecular dynamics and computational science,” according to Sandia. The prize, sponsored annually by the Association for Computing Machinery, recognizes outstanding achievement in HPC. The winner will be announced at the SC24 high performance computing conference in November in Atlanta.

A new paper, “Breaking the Molecular Dynamics Timescale Barrier Using a Wafer-Scale System,” describes the coordinated effort that connected Cerebras’ capabilities to the molecular dynamics simulations already under development at the three NNSA laboratories.

“The ultra-rapid simulation speed provides views for milliseconds instead of nanoseconds, offering a more complete picture of how materials evolve and behave,” said Sandia researcher Siva Rajamanickam, who led the tri-labs’ collaboration with Cerebras. “The ability to perform calculations this rapidly on a general-purpose processor enables the science community to understand chemical processes and material behaviors at timescales previously unachievable in commercially available hardware.”

The Cerebras processor achieves its speed by distributing the simulated atoms to the 900,000 cores on a single WSE. Interactions between the simulated atoms result in communication between the cores on the wafer, rather than between many GPUs as done on Oak Ridge National Laboratory’s Frontier exascale-class supercomputer, the world’s no. 1 ranked HPC system.

Sandia molecular dynamics expert and paper author Stan Moore put it this way: “Latencies in current supercomputers — such as the time it takes to send a message across the network — limit the timescales that can be achieved. On the Cerebras system, these latencies are greatly reduced, allowing orders of magnitude speedup in achievable timescales.”

Researchers from Sandia, Lawrence Livermore and Los Alamos National Laboratories — referred to as the tri-labs — worked together with Cerebras Systems as part of NNSA’s Advanced Memory Technology program. The simulation was performed on Cerebras’ WSE technology.

Potential applications include more detailed studies of the evolution of grain boundaries in metals to create more resilient materials, enabling renewable energy researchers in designing extended-duration energy storage systems and, with further enhancements, the observation of protein folding and drug-target interactions to accelerate the discovery of life-saving therapies.

Sandia Gordon Bell Team — from left, Aidan Thompson, James Laros, Sivasankaran Rajamanickam and Stan Moore. (Photo by Craig Fritz)

Aidan Thompson, a Sandia co-author of the paper and an expert in molecular dynamics who helped guide Cerebras on the algorithmic details of this project, praised the sheer speed of Cerebras technology.

“The maximum speed of a simulation had remained stubbornly fixed for at least the last decade at around 5 kilosteps per second,” Aidan said. “The Cerebras wafer-scale engine has smashed this barrier by achieving a speed of 699 kilosteps per second. Only highly specialized codes on specialized hardware run faster, and that advantage may not last long.”

“This bodes well for the future impacts of our program and its potential scientific advances,” said James Laros, a distinguished member of technical staff at Sandia and lead of the Advanced Memory Technologies program. “The Advanced Memory Technology-based partnership between the NNSA laboratories and Cerebras Systems reached new heights when the speedup on molecular dynamics simulations exceeded the AMT program’s goal — a 40-times performance improvement — by more than 10 times.”

“We all had our doubts about achieving this goal within the short timeframe, but Cerebras’ technology and new methods from our team helped us exceed this goal by demonstrating unprecedented improvement on molecular dynamics simulations,” Rajamanickam said. “These results open opportunities for materials research and science discoveries beyond what we envisioned.”

Innovative architecture enabled the team to surpass the performance level previously achieved. “The tri-labs have been fantastic partners in our journey. It is wonderful to see scientists at the laboratories actively collaborating with our team and pushing our wafer-scale technology to new frontiers,” said Michael James, Cerebras co-founder and chief architect of advanced technologies. “The success in molecular dynamics simulations is hopefully one of many to follow. We are very excited to continue the partnership with NNSA.”

Thuc Hoang, director of NNSA’s Advanced Simulation and Computing program, reflects on the strong partnership between the labs, NNSA and Cerebras.

“The collaboration between Cerebras, Sandia and the tri-lab community illustrates the advantage of industry partnership with the NNSA in the innovative AI technology space,” Hoang said. “It is a great example of the breakthroughs in science that can be achieved together, that would otherwise not be possible by any party on their own. We look forward to seeing continued partnerships with Cerebras and others for both AI and our scientific modeling and simulation missions.”

Stan Moore compares the graphics processing units on Frontier to racehorses: There are thousands of them, and paradoxically, that creates difficulties for relatively small problems.

A Cerebras Systems employee holding a Wafer Scale Engine. (Photo courtesy of Cerebras Systems)

“Imagine a simulation where you are trying to pull a cart,” Moore explained. “If you hook up one horse, it can move the cart, but it is slow. If you hook up eight horses, it goes faster because they can share the weight, but ultimately a horse can only run so fast even if it is pulling little weight. If you hook up 500 horses to the same cart, they just get in each other’s way, so adding too many horses to one cart doesn’t help. In contrast, the Cerebras wafer is like a race car that can pull the cart so much faster.”

Many scientific calculations, while benefiting from the increased speed of Frontier, are small enough that they will benefit even more from Cerebras, Moore said.

“Accomplishing such a result not only takes vision, but some sort of insane confidence in your ideas,” Rajamanickam said. “We thought it was a measured risk based on our experience and knowledge, but not everyone would think so. When we proposed it to our lab leadership and NNSA. They trusted us to run with it. That’s because we have an environment where we can take measured risks to solve these big problems that others will shy away from.”

Both Laros and Rajamanickam attribute the Advanced Memory Technology program’s role in taking measured risks as part of the investigatory process as an outcome of the Vanguard program, also led by Laros through the Advanced Simulation and Computing program. While Advanced Memory Technology works towards proof of concept in a technology space, Vanguard focuses on deploying advanced at-scale prototypes.

Vanguard addresses the gap between successful laboratory experiments and large-scale production by industry, filling that void with imaginative, mid-scale prototype platforms that reduce the odds of anything technically infeasible lurking on the road to full scale. Because it requires relatively small funding, a wider range of experimentation and risk is encouraged. If any fail, that’s provisionally good, because something imaginative was probably tried.

Work such as that exemplified in the Gordon Bell submission could help drive a next generation of prototype systems under existing DOE programs, such as Vanguard.

The work is funded by NNSA’s Advanced Simulation and Computing program. The NNSA is a semiautonomous DOE agency responsible for the management and security of the nation’s nuclear weapons, nuclear nonproliferation and naval reactor programs, as well as responding to nuclear and radiological emergencies in the U.S. and abroad.

While  Rajamanickam praises Cerebras, he also emphasizes the hard work and calculated risks taken by Sandia researchers that made the project possible.

“The AMT program and this team responded when Congress asked, ‘Can you get us 40 times better performance for far less money?’”

The Congressional request, based on emerging possibilities, was to build a program capable of demonstrating significant speedup over the world’s leading exascale supercomputer.

“We accepted the challenge. In two years, we beat Frontier — using a single Cerebras wafer — by 457 times. We took measured risks, knowing it is OK to fail,” Rajamanickam said. “We were supported by both NNSA and lab leadership. That culture makes a big difference.”

The selection of Cerebras was carefully vetted.

“Cerebras has been making their hardware for artificial intelligence applications,” Rajamanickam said. “We wanted to use it for scientific computing. This is new to them, new to us, and the first time anyone has done anything like this — that we know of — in the world. We worked on demonstrating the approach together for two years, codesigning where computer scientists and material scientists came together to design a new algorithm. Sandia, Cerebras, LLNL and LANL all came together to develop the method, choose the right problem to solve and execute it perfectly.”