At ISC 2024, Hamburg — Aurora, the Intel-HPE Cray problem-child supercomputer, has officially received the blessing of the Top500 organization as having surpassed the exascale (a billion billion calculations/second) milestone, but barely. According to the new Top500 list released today in Hamburg at the ISC 2024 conference, Aurora turned in a high-performance LINPACK (HPL) benchmark of 1.012 exaFLOPS.
That places Aurora no. 2 behind the reigning no. 1 system Frontier, an AMD-HPE Cray system housed at Oak Ridge National Laboratory. Frontier, with a LINPACK score of 1.26 exaFLOPS, has been the top system since May 2022.
Aurora thus becomes the second system to officially achieve exascale-class power, though it is known that China has several such systems that have not undergone Top500 LINPACK benchmarking.
Despite Aurora having joined the exascale club, the system seems to attract controversy. The system, scheduled to be the first exascale-class system, was delayed several times from the date of its origin delivery date, much of this due to Intel’s delayed delivery of its “Ponte Vecchio” GPU. In addition, there have been widespread expectation that Aurora will reach 2 exaFLOPS performance when installation and system tuning, still underway, are completed. But according to a senior Argonne manager speaking at a media pre-brief here in Hamburg, that’s an incorrect impression originally put forward by a former Intel executive who was speaking about Aurora’s theoretical peak, not delivered, performance. More on this below.
As for the most closely watched aspect of the Top500, the top 10 most powerful systems remain largely the same as it was six months ago during the SC23 conference. The one new system to find its way onto the top 10 is the Alps machine at no. 6 from the Swiss National Supercomputing Centre (CSCS) in Switzerland. The HPE Cray system, powered by Nvidia Grace processors, achieved an HPL score of 270 PFlop/s.
Looking at the rest of the top 10, the Eagle system installed on the Microsoft Azure Cloud reclaimed the No. 3 position achieved after its debut appearance on the previous list, and it’s still the highest-ranking cloud system on the Top500. This Microsoft NDv5 system has an HPL score of 561.2 PFlop/s and is based on Intel Xeon Platinum 8480C processors and Nvidia H100 GPU accelerators.
Fugaku, the Arm-based system at Japan’s RIKEN Center for Computational Science, is at no. 4 with an HPL of 442 PFlop/s and it remains the highest-ranked system outside the U.S.
The AMD/HPE Cray LUMI system at EuroHPC/CSC in Finland is no. 5 with an HPL score of 380 PFlop/s. This machine is the largest system in Europe.
After Alps, the no. 7 system is Leonardo, installed at a EuroHPC site in CINECA, Italy. It is an Atos BullSequana XH2000 system powered by Xeon Platinum CPUs and Nvidia A100 accelerators, and Quad-rail Nvidia HDR100 Infiniband as interconnect. It achieved a benchmark of 241.2 Pflop/s.
The MareNostrum 5 ACC system is now at no. 8 and installed at the EuroHPC/Barcelona Supercomputing Center in Spain. This BullSequana XH3000 system uses Xeon Platinum 8460Y processors with NVIDIA H100 and Infiniband NDR200. It achieved 175.3 Pflop/s HPL performance.
Summit, an IBM-built system at the Oak Ridge National Laboratory, is now the no. 9 system. Remaining operational after it’s scheduled retirement timetable, the system has an HPC of 148.6 Pflop/s. Summit has 4,356 nodes, each one housing two POWER9 CPUs with 22 cores each and six Nvidia Tesla V100 GPUs each with 80 streaming multiprocessors (SM).
The Eos system listed at no. 10, it’s an in -house Nvidia DGX SuperPOD powered by H100 GPUs with Xeon Platinum 8480C processors.
The Top500 organization also pointed out that China, though it no longer takes part in Top500 HPL benchmarking, and the United States are the countries with most of the entries on the entire Top500 list. The U.S. added seven systems over the previous list, bringing its total to 168. China dropped its number of machines from 104 to 80 systems. “In fact, China did not report a single new machine for this new list,” Top500 said.
However, the 63rd edition of the list shows an upset in terms of representation from entire continents. North America kept the top spot by increasing from 160 machines on the previous list to 171, while Asia dropped from 169 machines to 148. Europe increased from 143 systems to 160, officially overtaking Asia and putting Europe in second place behind North America.
More details on the entire list can be found here.
Green500
This edition of the Green500 saw a shakeup with all of the top three machines being new to the list.
The no. 1 spot on the Green500 is now the JEDI – JUPITER Exascale Development Instrument, a new system from EuroHPC/FZJ in Germany. Taking the no. 190 spot on the Top500, JEDI achieved an energy efficiency rating of 72.73 GFlops/Watt while producing an HPL score of 4.5 PFlop/s. JEDI is a BullSequana XH3000 machine with a Grace Hopper Superchip 72C. It has 19,584 total cores.
The Isambard-AI machine at the University of Bristol in the UK claimed the No. 2 spot with an energy efficiency rating of 68.83 GFlops/Watt and an HPL score of 7.42 PFLop/s. Isambard-AI achieved the No. 129 spot on the TOP500 and has 34,272 total cores.
The No. 3 spot was claimed by the Helios system from Cyfronet out of Poland. The machine achieved an energy efficiency score of 66.95 GFlops/Watt and an HPL score of 19.14 PFlop/s.
Top500 said the Frontier system “deserves an honorable mention” when discussing energy efficiency. Frontier achieved an exascale HPL score of 1.206 EFlop/s while also earning an energy efficiency score of 56.97 GFlops/Watt. This places the system at No. 11 on the GREEN500 in addition to its No. 1 spot on the TOP500.
Aurora
Returning to what may become the next Aurora controversy, amid expectations that the system will ultimately attain 2 exaFLOPS performance, associate lab director and Argonne distinguished fellow Rick Stevens said this is an incorrect impression created by a former Intel executive involved in the development and delivery of Aurora.
Stevens said 2 exaFLOPS is a theoretical peak number arrived at by multiplying the number of Aurora’s cores by their processors’ peak performance capabilities, but it was never a performance target expected by Argonne system managers responsible for standing up Aurora.
“Peak is a number that’s calculated, it’s a theoretical number by calculating clock rates times the operation counts of the individual computing elements,” Stevens said last night. “So, typical systems achieve fractions of their peak, somewhere between 50, 60, 70 percent, depending on what they’re doing. The 2 exaFLOPS peak number on Aurora is actually a combination of adding FLOPS from the GPUs and CPUs. And typically, in a benchmark, you wouldn’t do that. If you look at other comparable systems on the Top500, they will also have very large peak numbers, and the fraction of peak that’s being accomplished on Aurora is similar to the fraction of peak being accomplished on those other systems. So there’s no fundamental issue here.”
That said, Stevens explained that Aurora should realize a LINPACK boost because 11 percent of Aurora blades had yet to be engaged when the most recent benchmarks were run. When system installation is completed, a higher benchmark figure is expected, Stevens said, though he declined to offer a specific number targeted by Argonne.
Regarding public expectations of Aurora, HPC-AI industry analyst Addison Snell, CEO of Intersect360 Research, said:
“I think it was the general industry expectation that this would be delivering above and beyond Frontier on LINPACK on peak on other benchmarks. Now, standing up any exascale system, or any system at this scale, is a tremendous achievement. And it’s important to remember that the point isn’t LINPACK. The point is the scientific gains that we’re going to get out of this system. But nevertheless, it’s hard to get away from the notion that this system is short of the expectations that we had.”
Earl Joseph, CEO of HPC-AI industry analyst firm Hyperion Research, came down on the side of Argonne’s less lofty expectations.
“All along, because it’s a new processor from Intel and the interconnected system, Argonne has been very nervous about what will actually land as far as a LINPACK number, and so they’ve had a lot of uncertainties about that,” he said. “My guess is over time, it’s going to improve a little bit. I agree with Rick, I don’t think they’re ever going to actually get 2.0 (exaFLOPS).”
Calling himself an optimist, Joseph said he thinks Aurora will ultimately exceed 1.5 exaFLOPS.