MLPerf Training and HPC Benchmark Show 49X Performance Gains in 5 Years

Print Friendly, PDF & Email

Today, MLCommons announced new results from two MLPerf benchmark suites: the MLPerf Training v3.1 suite, which measures the performance of training machine learning models; and the MLPerf HPC v.3.0 benchmark suite, targeted at supercomputers and measures the performance of training machine learning models for scientific applications and data.

MLCommons said the results highlight performance gains of up to 2.8X compared to 5 months ago and 49X over the first results five years ago, “reflecting the tremendous rate of innovation in systems for machine learning,” the organizations said.

Significant to this round, is the largest system ever submitted to MLPerf Training. Comprising over 10K accelerators, it demonstrates the extraordinary progress by the machine learning community in scaling system size to advance the training of neural networks.

To view the results for MLPerf Training v3.1 and MLPerf HPC v3.0 and find additional information about the benchmarks, visit the Training and HPC benchmark pages.

The MLPerf Training benchmark suite comprises full system tests that stress machine learning models, software, and hardware for a broad range of applications. The open-source and peer-reviewed benchmark suite provides a level playing field for competition that drives innovation, performance, and energy-efficiency for the entire industry.

MLPerf Training v3.1 includes over 200 performance results from 19 submitting organizations: Ailiverse, ASUSTek, Azure, Azure+NVIDIA, Clemson University Research Computing and Data, CTuning, Dell, Fujitsu, GigaComputing, Google, Intel+Habana Labs, Krai, Lenovo, NVIDIA, NVIDIA+CoreWeave, Quanta Cloud Technology, Supermicro, Supermicro+Red Hat, and xFusion. MLCommons would like to especially congratulate first-time MLPerf Training submitters Ailiverse, Clemson University Research Computing and Data, CTuning Foundation, and Red Hat.

MLPerf Training v3.1 introduces the new Stable Diffusion generative AI benchmark model to the suite. Based on Stability AI’s Stable Diffusion v2 latent diffusion model, Stable Diffusion takes text prompts as inputs and generates photorealistic images as output. It is the core technology behind an emerging and exciting class of tools and applications such as Midjourney and Lensa.

“Adding Stable Diffusion to the benchmark suite is timely, given how image generation has exploded in popularity,” said Eric Han, MLPerf Training co-chair. “This is a critical new area – extending Generative AI to the visual domain.”

MLCommons added the GPT-3 benchmark to MLPerf Training v3.0 last June. In just five months, the LLM benchmark has shown over 2.8X in performance gains. Eleven submissions in this round include this large language model (LLM) using the GPT-3 reference model, reflecting the tremendous popularity of generative AI.

“GPT-3 is among the fastest growing benchmarks we’ve launched,” said David Kanter, Executive Director, MLCommons. “It’s one of our goals to ensure that our benchmarks are representative of real-world workloads and it’s exciting to see 2.8X better performance in mere months.”

The MLPerf HPC benchmark is similar to MLPerf Training, but is specifically intended for high-performance computing systems that are commonly employed in leading-edge scientific research. It emphasizes training machine learning models for scientific applications and data, such as quantum molecular dynamics, and also incorporates an optional throughput metric for large systems that commonly support multiple users.

MLCommons added a new protein-folding benchmark in the HPC v3.0 benchmark suite: the OpenFold generative AI model, which predicts the 3D structure of a protein given a 1D amino acid sequence. Developed by Columbia University, OpenFold is an open-source reproduction of the AlphaFold 2 foundation model and has been the cornerstone of a large number of research projects since its creation.

MLPerf HPC v3.0 includes over 30 results – a 50% increase in participation over last year, and  includes submissions by 8 organizations with some of the world’s largest supercomputers: Clemson University Research Computing and Data, Dell, Fujitsu+RIKEN, HPE+Lawrence Berkeley National Laboratory, NVIDIA, and Texas Advanced Computing Center. MLCommons congratulates first-time MLPerf HPC submitters Clemson University Research Computing and Data and HPE+Lawrence Berkeley National Laboratory.

The new OpenFold benchmark includes submissions from 5 organizations: Clemson University Research Computing and Data, HPE+Lawrence Berkeley National Laboratory, NVIDIA, and Texas Advanced Computing Center,

The MLPerf HPC benchmark suite demonstrates considerable progress in AI for science that will help unlock new discoveries. For example, the DeepCAM weather modeling benchmark is 14X faster than when it debuted, illustrating how rapid innovations in machine learning systems can empower scientists with better tools to address critical research areas and advance our understanding of the world.

“The addition of OpenFold follows the spirit of the MLPerf HPC benchmark suite: Accelerating workloads with potential for global-scale contribution. We are excited for the new addition as well as the increased participation in the latest submission round.” said Andreas Prodromou, MLCommons HPC co-chair.

MLCommons is the world leader in building benchmarks for AI. It is an open engineering consortium with a mission to make machine learning better for everyone through benchmarks and data. The foundation for MLCommons began with the MLPerf benchmarks in 2018, which rapidly scaled as a set of industry metrics to measure machine learning performance and promote transparency of machine learning techniques. In collaboration with its 125+ members, global technology providers, academics, and researchers, MLCommons is focused on collaborative engineering work that builds tools for the entire machine learning industry through benchmarks and metrics, public datasets, and best practices.