Scientists Using Frontier Supercomputer Win 2022 Gordon Bell Prize, Another Frontier Team Named Prize Finalist

Frontier supercomputer

[SPONSORED CONTENT]   How many researchers can say they’ve not only run their scientific job on the AMD-powered Frontier supercomputer, the world’s no. 1 ranked HPC system and the first exascale-class machine, but also on Fugaku, Summit and Perlmutter, the world’s second-, fifh- and eighth-ranked HPC systems in the world, respectively (see TOP500 list)?

But that’s the case with an interntional group of researchers working on particle-in-cell simulations who have developed code that won this year’s Gordon Bell Award (see related news story) from the Association for Computer Machinery (ACM) for outstanding achievement in HPC.

Another team of researchers, at the U.S. Department of Energy’s Oak Ridge National Laboratory (whose Oak Ridge Leadership Computing Facility houses Frontier), Georgia Institute of Technology and University of California San Francisco, have been nominate for using Frontier and machine learning data mining techniques to search millions of medical and scientific papers and publications for overlooked, potential treatments for illness and disease.

Frontier, powered by 3rd Gen AMD EPYC processors and AMD Instinct MI250x accelerators and built on the HPE Cray EX supercomputing architecture, was delivered to the U.S. Department of Energy’s Oak Ridge National Laboratory late last year. It then went through months of testing and tuning that culminated in the system becoming the first to exceed the exascale (a billion billion [1018] calculations per second) in time for last spring’s bi-annual the TOP500 list.

Frontier tuning is still ongoing and full user-readiness is expected early next year. But the two Gordon Bell Prize nominees demonstrate that important scientific research is already underway on the system.

Let’s first look at the prize-winning team from DOE’s Lawrence Berkeley National Laboratory, Lawrence Livermore National Laboratory and the French Alternative Energies and Atomic Energy Commission (CEA) who have developed a new particle accelerator simulation code called WarpX. The software is the first mesh-refined, particle-in-cell (MR PIC) code for kinetic plasma simulations optimized for parallel computing on Frontier, Fugaku, Fugaku, Summit and Perlmutter.

“The MR PIC code enabled 3D simulations of laser-matter interactions…,” the researchers stated in the abstract of a paper on their work, “which have so far been out of the reach of standard codes. These simulations helped remove a major limitation of compact laser-based electron accelerators, which are promising candidates for next generation high-energy physics experiments and ultra-high dose rate FLASH radiotherapy.”

Accelerators aim speeded-up particles break up target materials to enable the study of the properties of matter. Berkeley Senior Scientist and project leader Jean-Luc Vay said WarpX was development to simulate plasma-based accelerators used in a range of fields, including cancer treatments and semiconductor manufacturing. The goal is to investigate whether small, relatively inexpensive particle accelerators can do some of the work of large accelerators, such as the 16-mile-long Large Hadron Collider in Switzerland or the Spallation Neutron Source at Oak Ridge, mammoth project that took years to construct.

We spoke with four of the researchers about their work on Frontier.

On the software front, Berkeley Lab’s Axel Huebl said the researchers worked on an early iteration of Frontier through the auspices of the Exascale Computing Project (ECP) to help tune the system for full-scale runs of WarpX. This meant that by July, Frontier was able to do runs of the code on roughly 8,500 of Frontier’s 9,400 nodes.

“We worked through the ECP to give feedback to the vendors, so the moment Frontier was ready, we were ready to go,” Heubl said.

As for performance, he said, “We ran (WarpX) on all the large-scale machines in the world that we could get our hands on, and Frontier is more powerful than any predecessor we had before. We were very grateful we could use it. Performance-wise, we needed large, 3-D simulations at very fine resolution for the physics in play… Using 8,500 nodes, Frontier performed really efficiently, which means we can do significantly larger science cases now.”

The second team of Frontier users who were nominated for the Gordon Bell Prize – for advanced data mining of medical information – delivered another star turn for the new supercomputer. In fact, the researchers said Frontier exceeded an exaFLOP in the execution of the application.

A major problem in medicine is that research information exceeds human capacity to ingest it. So the research team, led by ORNL’s Group Leader for Discrete Algorithms, Ramakrishnan Kannan, began working on a graph-the, oretical approach to data mining. The target: mining of scientific articles, specifically biomedical literature, to discover unknown relationships among concepts. One example is an environmental agency discovering a previously unnoticed link between a toxin and a medical condition. Another is a pharmaceutical laboratory discovering a previously overlooked candidate drug for a disease.

Oak Ridge National Laboratory

The effort began by targeting COVID-19 research.

“We started by taking the CORD-19 data set, which contains publications and preprints on COVID-19 and other coronaviruses such as SARS and MERS,” said Jakub Kurzak, Senior Member of Technical Staff at AMD,who supported Kannan and the project team using Frontier. “This data set contains over 1 million papers, which on its own represents a very large volume of knowledge. But then we combine it with the PubMed data set, which contains 34 million articles on life sciences and biomedical topics, some dating back to 1809. We ended up with a problem of exploring over 300 thousand concepts linked by over 100 million relationships, a far larger data corpus than any individual human can explore.”

Kurzak said an objective of the project is to improve existing medical knowledge graphs, such as SPOKE (Scalable PrecisiOn Medicine Knowledge Engine) operated by the University of California San Francisco. It combines data from over 30 sources and contains 3 million nodes and 15 million edges.

“The scientific question is if we can automatically find links that have not been discovered yet,” Kurzak said. “The answer is ‘yes.’ We discovered 181 paths that existed in SPOKE partially and 159 paths that did not exist in SPOKE at all. Each newly discovered path means that the algorithm has determined an important connection that was not explicitly captured by human curated sources. Each new path may be a new link between a symptom and a disease or may point to a new drug candidate and may, therefore, have lifesaving consequences.”

Kurzak said Frontier executed superbly – the application running on the system crossed the performance of one exaFLOPS.

“The highest performance that we are aware of, for this kind of problem was achieved in 2020 using the Summit system at ORNL – 136 petaFLOPS,” he said, “which means we improved the performance by over 7x. 7x improvement gives us the ability to do in a matter of days the amount of work that used to take weeks. This is also a historic milestone for a real application in graph analytics.”

He added that Frontier’s power is hardly comprehensible.

“Crossing an exaFLOPS still feels a little surreal,” said Kurzak, “even more so, crossing an exaFLOPS when doing actual, useful science. A billion billion operations per seconds is beyond human comprehension. If all 8 billion people on the earth completed one calculation per second, it would take them over 4 years to perform the amount of work that Frontier can do in one second. But that is precisely why we can find all shortest paths in a graph with 30 million nodes and 120 million edges.”

Kurzak also was impressed by Frontier’s user readiness. “It still surprises me sometimes how similar the x86 software environment on Frontier is to that of my Threadripper plus Radeon VII Linux box. A lot of mainstream development tools are readily available on Frontier, because it is an x86 system running Linux.”

Part of this can be attributed to AMD’s ROCm software stack for graphics processing unit programming, which is completely open source. “On a couple of occasions, we were asked for a version of a library with some secret sauce in it and ended up pointing people to GitHub, because it is all out there in the open. Notably, there is no secret sauce in the code we developed for the Gordon Bell submission. The GPU kernel that made the exaFLOPS possible is written plain C++, no assembly, no proprietary extensions of any kind. The Clang compiler in the public release of ROCm did its job. We hope it leaves no doubt about the strength of AMD’s commitment to open-source software.”