Making AI Accessible to Any Size Enterprise

In this sponsored post, our friends over at Lenovo and NetApp have teamed up with NVIDIA to discuss how the companies are helping to drive Artificial Intelligence (AI) into smaller organizations and hopefully seed that creative garden. Experience tells us that there is a relationship between organizational size and technology adoption:  Larger, more resource-rich, enterprises generally adopt new technologies first, while smaller, more resource constrained organizations follow afterward, (provided that the small organization isn’t in the technology business). 

Inspur Re-Elected as Member of SPEC OSSC and Chair of SPEC Machine Learning

The Standard Performance Evaluation Corporation (SPEC) has finalized the election of new Open System Steering Committee (OSSC) executive members, which include Inspur, Intel, AMD, IBM, Oracle and three other companies. “It is worth noting that Inspur, a re-elected OSSC member, was also re-elected as the chair of the SPEC Machine Learning (SPEC ML) working group. The development plan of ML test benchmark proposed by Inspur has been approved by members which aims to provide users with standard on evaluating machine learning computing performance.”

Efficient Model Selection for Deep Neural Networks on Massively Parallel Processing Databases

Frank McQuillan from Pivotal gave this talk at FOSDEM 2020. “In this session we will present an efficient way to train many deep learning model configurations at the same time with Greenplum, a free and open source massively parallel database based on PostgreSQL. The implementation involves distributing data to the workers that have GPUs available and hopping model state between those workers, without sacrificing reproducibility or accuracy.”

Podcast: Advancing Deep Learning with Custom-Built Accelerators

In this Chip Chat podcast, Carey Kloss from Intel outlines the architecture and potential of the Intel Nervana NNP-T. He gets into major issues like memory and how the architecture was designed to avoid problems like becoming memory-locked, how the accelerator supports existing software frameworks like PaddlePaddle and TensorFlow, and what the NNP-T means for customers who want to keep on eye on power usage and lower TCO.

New MLPerf Benchmark Measures Machine Learning Inference Performance

Today a consortium involving over 40 leading companies and university researchers introduced MLPerf Inference v0.5, the first industry standard machine learning benchmark suite for measuring system performance and power efficiency. “Our goal is to create common and relevant metrics to assess new machine learning software frameworks, hardware accelerators, and cloud and edge computing platforms in real-life situations,” said David Kanter, co-chair of the MLPerf inference working group. “The inference benchmarks will establish a level playing field that even the smallest companies can use to compete.”

Scalable and Distributed DNN Training on Modern HPC Systems

DK Panda from Ohio State University gave this talk at the Swiss HPC Conference. “We will provide an overview of interesting trends in DNN design and how cutting-edge hardware architectures are playing a key role in moving the field forward. We will also present an overview of different DNN architectures and DL frameworks. Most DL frameworks started with a single-node/single-GPU design.”

Quobyte Distributed File System adds TensorFlow Plug-In for Machine Learning

Today Quobyte announced that the company’s Data Center File System is the first distributed file system to offer a TensorFlow plug-in, providing increased throughput performance and linear scalability for ML-powered applications to enable faster training across larger data sets while achieving higher-accuracy results. “By providing the first distributed file system with a TensorFlow plug-in, we are ensuring as much as a 30 percent faster throughput performance improvement for ML training workflows, helping companies better meet their business objectives through improved operational efficiency,” said Bjorn Kolbeck, Quobyte CEO.

Accelerating TensorFlow with RDMA for High-Performance Deep Learning

Xiaoyi Lu from Ohio State University gave this talk at the 2019 OpenFabrics Workshop in Austin. “Google’s TensorFlow is one of the most popular Deep Learning (DL) frameworks. We propose a unified way of achieving high performance through enhancing the gRPC runtime with Remote Direct Memory Access (RDMA) technology on InfiniBand and RoCE. Through our proposed RDMAgRPC design, TensorFlow only needs to run over the gRPC channel and gets the optimal performance.”

Announcing Google’s New TPU Dev Board for Machine Learning on the Edge

Google just launched Coral, a Beta platform for building intelligent devices with local AI. To enable this initiative, Google is making an edge version of its TensorFlow Processing Unit available for sale for the first time. “Our first hardware components feature the new Edge TPU, a small ASIC designed by Google that provides high-performance ML inferencing for low-power devices. For example, it can execute state-of-the-art mobile vision models such as MobileNet V2 at 100+ fps, in a power efficient manner.”

Video: TensorFlow for HPC?

In this podcast, Peter Braam looks at how TensorFlow framework could be used to accelerate high performance computing. “Google has developed TensorFlow, a truly complete platform for ML. The performance of the platform is amazing, and it begs the question if it will be useful for HPC in a similar manner that GPU’s heralded a revolution.”