MLCommons: MLPerf Results Show AI Performance Gains

Print Friendly, PDF & Email

Today ML Commons announced new results from two industry-standard MLPerf benchmark suites: Training v3.0, which measures the performance of training machine learning models, and Tiny v1.1, which measures how quickly a trained neural network can process new data for extremely low-power devices in the smallest form factors.

To view the results and to find additional information about the benchmarks visit: Training v3.0 and Tiny v1.1.

Training models faster empowers researchers to unlock new capabilities, such as the latest advances in generative AI. The latest MLPerf Training round demonstrates broad industry participation and highlights performance gains of up to 1.54x compared to just six months ago and 33-49x over the first round, reflecting the tremendous rate of innovation in systems for machine learning.

The MLPerf Training benchmark suite comprises full system tests that stress machine learning models, software, and hardware for a broad range of applications. The open-source and peer-reviewed benchmark suite provides a level playing field for competition that drives innovation, performance, and energy-efficiency for the entire industry.

In this round, MLPerf Training added two new benchmarks to the suite. The first is a large language model (LLM) using the GPT-3 reference model that reflects the rapid adoption of generative AI. The second is an updated recommender, modified to be more representative of industry practices, using the DLRM-DCNv2 reference model. These new tests help advance AI by ensuring that industry-standard benchmarks are representative of the latest trends in adoption and can help guide customers, vendors, and researchers alike.

“I’m excited to see the debut of GPT-3 and DLRM-DCNv2, which were built based on extensive feedback from the community and leading customers and demonstrate our commitment to keep the MLPerf benchmarks representative of modern machine learning,” said David Kanter, executive director of MLCommons.

The MLPerf Training v3.0 round includes over 250 performance results, an increase of 62% over the last round, from 16 different submitters: ASUSTek, Azure, Dell, Fujitsu, GIGABYTE, H3C, IEI, Intel & Habana Labs, Krai, Lenovo, NVIDIA, NVIDIA + CoreWeave, Quanta Cloud Technology, Supermicro, and xFusion. In particular, MLCommons would like to congratulate first time MLPerf Training submitters CoreWeave, IEI, and Quanta Cloud Technology.

“It is truly remarkable to witness system engineers continuously pushing the boundaries of performance on workloads that hold utmost value for users via MLPerf,” said Ritika Borkar, co-chair of the MLPerf Training Working Group. “We are particularly thrilled to incorporate an LLM benchmark in this round, as it will inspire system innovation for a workload that has the potential of revolutionizing countless applications.”

Tiny compute devices are a pervasive part of everyone’s everyday life, from tire sensors in your vehicles to your appliances and even your fitness tracker. Tiny devices bring intelligence to life at very little cost.

ML inference on the edge is increasingly attractive to increase energy efficiency, privacy, responsiveness, and autonomy of edge devices. Tiny ML breaks the traditional paradigm of energy and compute hungry ML by eliminating networking overhead, allowing for greater overall efficiency and security relative to a cloud-centric approach. The MLPerf Tiny benchmark suite captures a variety of inference use cases that involve “tiny” neural networks, typically 100 kB and below, that process sensor data, such as audio and vision, to provide endpoint intelligence for low-power devices in the smallest form factors. MLPerf Tiny tests these capabilities in a fair and reproducible manner, in addition to offering optional power measurement.

In this round, the Tiny ML v1.1 benchmarks include 10 submissions from academic, industry organizations, and national labs, producing 159 peer-reviewed results. Submitters include: Bosch, cTuning, fpgaConvNet, Kai Jiang, Krai, Nuvoton, Plumerai, Skymizer, STMicroelectronics, and Syntiant. This round includes 41 power measurements, as well. MLCommons congratulates Bosch, cTuning, fpgaConvNet, Kai Jiang, Krai, Nuvoton, and Skymizer on their first submissions to MLPerf Tiny.

“I’m particularly excited to see so many companies embrace the Tiny ML benchmark suite,” said David Kanter, Executive Director of MLCommons. “We had 7 new submitters this round which demonstrates the value and importance of a standard benchmark to enable device makers and researchers to choose the best solution for their use case.”

“With so many new companies adopting the benchmark suite it’s really extended the range of hardware solutions and innovative software frameworks covered. The v1.1 release includes submissions ranging from tiny and inexpensive microcontrollers to larger FPGAs, showing a large variety of design choices,” said Dr. Csaba Kiraly, co-chair of the MLPerf Tiny Working Group. “And the combined effect of software and hardware performance improvements are 1000-fold in some areas compared to our initial reference benchmark results, which shows the pace that innovation is happening in the field.”