Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


New MLPerf Benchmark Measures Machine Learning Inference Performance

Today a consortium involving over 40 leading companies and university researchers introduced MLPerf Inference v0.5, the first industry standard machine learning benchmark suite for measuring system performance and power efficiency. “Our goal is to create common and relevant metrics to assess new machine learning software frameworks, hardware accelerators, and cloud and edge computing platforms in real-life situations,” said David Kanter, co-chair of the MLPerf inference working group. “The inference benchmarks will establish a level playing field that even the smallest companies can use to compete.”

Scalable and Distributed DNN Training on Modern HPC Systems

DK Panda from Ohio State University gave this talk at the Swiss HPC Conference. “We will provide an overview of interesting trends in DNN design and how cutting-edge hardware architectures are playing a key role in moving the field forward. We will also present an overview of different DNN architectures and DL frameworks. Most DL frameworks started with a single-node/single-GPU design.”

Quobyte Distributed File System adds TensorFlow Plug-In for Machine Learning

Today Quobyte announced that the company’s Data Center File System is the first distributed file system to offer a TensorFlow plug-in, providing increased throughput performance and linear scalability for ML-powered applications to enable faster training across larger data sets while achieving higher-accuracy results. “By providing the first distributed file system with a TensorFlow plug-in, we are ensuring as much as a 30 percent faster throughput performance improvement for ML training workflows, helping companies better meet their business objectives through improved operational efficiency,” said Bjorn Kolbeck, Quobyte CEO.

Accelerating TensorFlow with RDMA for High-Performance Deep Learning

Xiaoyi Lu from Ohio State University gave this talk at the 2019 OpenFabrics Workshop in Austin. “Google’s TensorFlow is one of the most popular Deep Learning (DL) frameworks. We propose a unified way of achieving high performance through enhancing the gRPC runtime with Remote Direct Memory Access (RDMA) technology on InfiniBand and RoCE. Through our proposed RDMAgRPC design, TensorFlow only needs to run over the gRPC channel and gets the optimal performance.”

Announcing Google’s New TPU Dev Board for Machine Learning on the Edge

Google just launched Coral, a Beta platform for building intelligent devices with local AI. To enable this initiative, Google is making an edge version of its TensorFlow Processing Unit available for sale for the first time. “Our first hardware components feature the new Edge TPU, a small ASIC designed by Google that provides high-performance ML inferencing for low-power devices. For example, it can execute state-of-the-art mobile vision models such as MobileNet V2 at 100+ fps, in a power efficient manner.”

Video: TensorFlow for HPC?

In this podcast, Peter Braam looks at how TensorFlow framework could be used to accelerate high performance computing. “Google has developed TensorFlow, a truly complete platform for ML. The performance of the platform is amazing, and it begs the question if it will be useful for HPC in a similar manner that GPU’s heralded a revolution.”

Using Ai to Automatically Diagnose Alzheimer’s Disease

Researchers from Stanford University have developed a deep learning based system that can automatically detect Alzheimer’s disease and its biomarkers from MRIs, with 94 percent accuracy. “Our method uses minimal preprocessing of MRIs (imposing minimum preprocessing artifacts) and utilizes a simple data augmentation strategy of downsampled MR images for training purposes,” the researchers stated in their paper.

Exploiting HPC Technologies for Accelerating Big Data Processing and Associated Deep Learning

DK Panda from Ohio State University gave this talk at the Swiss HPC Conference. “This talk will provide an overview of challenges in accelerating Hadoop, Spark, and Memcached on modern HPC clusters. An overview of RDMA-based designs for Hadoop (HDFS, MapReduce, RPC and HBase), Spark, Memcached, Swift, and Kafka using native RDMA support for InfiniBand and RoCE will be presented. Enhanced designs for these components to exploit NVM-based in-memory technology and parallel file systems (such as Lustre) will also be presented.”

Video: IBM Sets Record TensorFlow Performance with new Snap ML Software

In this video, researchers from IBM Research in Zurich describe how the new IBM Snap Machine Learning (Snap ML) software was able to achieve record performance running TesorFlow. “This training time is 46x faster than the best result that has been previously reported, which used TensorFlow on Google Cloud Platform to train the same model in 70 minutes.”

Google Cloud TPU Machine Learning Accelerators now in Beta

John Barrus writes that Cloud TPUs are available in beta on Google Cloud Platform to help machine learning experts train and run their ML models more quickly. “Cloud TPUs are a family of Google-designed hardware accelerators that are optimized to speed up and scale up specific ML workloads programmed with TensorFlow. Built with four custom ASICs, each Cloud TPU packs up to 180 teraflops of floating-point performance and 64 GB of high-bandwidth memory onto a single board.