New MLPerf Benchmark Measures Machine Learning Inference Performance

Print Friendly, PDF & Email

Today a consortium involving over 40 leading companies and university researchers introduced MLPerf Inference v0.5, the first industry standard machine learning benchmark suite for measuring system performance and power efficiency. The benchmark suite covers models applicable to a wide range of applications (autonomous driving, natural language processing, and many more) on a variety of form factors (smartphones, PCs, edge servers, and cloud computing platforms in the data center).

MLPerf Inference v0.5 uses a combination of carefully selected models and data sets to ensure that the results are relevant to real-world applications and will push the state-of-the-art forward.

By measuring inference, this benchmark suite will give valuable information on how quickly a trained neural network can process new data to provide useful insights. Previously, MLPerf released the companion Training v0.5 benchmark suite leading to 29 different results measuring the performance of cutting-edge systems for training deep neural networks.

The new MLPerf inference benchmarks will accelerate the development of hardware and software needed to unlock the full potential of ML applications,” stated Vijay Janapa Reddi, Associate Professor, Harvard University, and MLPerf Inference working group Co-Chair. “It will also stimulate innovation within the academic and research communities.”

MLPerf Inference v0.5 consists of five benchmarks, focused on three common ML tasks:

  • Image Classification – predicting a “label” for a given image from the ImageNet dataset, such as identifying items in a photo.
  • Object Detection – picking out an object using a bounding box within an image from the MS-COCO dataset, commonly used in robotics, automation, and automotive.
  • Machine Translation – translating sentences between English and German using the WMT English-German benchmark, similar to auto-translate features in widely used chat and email applications.

MLPerf provides benchmark reference code implementations that define the problem, model, and quality target, and provide instructions to run the benchmarks. The reference implementations are available in ONNX, PyTorch, and TensorFlow frameworks. The MLPerf inference benchmark working group follows an “agile” benchmarking methodology: launching early, involving a broad and open community, and iterating rapidly. The mlperf.org website provides a complete specification with guidelines on the reference code and will track future results.

Our goal is to create common and relevant metrics to assess new machine learning software frameworks, hardware accelerators, and cloud and edge computing platforms in real-life situations,” said David Kanter, co-chair of the MLPerf inference working group. “The inference benchmarks will establish a level playing field that even the smallest companies can use to compete.”

The inference benchmarks were created thanks to the contributions and leadership of our members over the last 11 months, including: Arm, Cadence, Centaur Technology, Dividiti, Facebook, Futurewei, General Motors, Google, Habana Labs, Harvard University, Intel, MediaTek, Microsoft, Myrtle, Nvidia, Real World Insights, University of Toronto, and Xilinx.

MLPerf provides a clear goal post for organizations to align their research and development efforts and guide investment and purchasing decisions,” said Peter Mattson, General Chair of MLPerf. “The MLPerf inference benchmarks are a major step toward that goal.” Now that the new benchmark suite has been released, organizations can submit results that demonstrate the benefits of their ML systems on these benchmarks. Interested organizations should contact info@mlperf.org.”

Sign up for our insideHPC Newsletter