Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:

Quobyte Distributed File System adds TensorFlow Plug-In for Machine Learning

Today Quobyte announced that the company’s Data Center File System is the first distributed file system to offer a TensorFlow plug-in, providing increased throughput performance and linear scalability for ML-powered applications to enable faster training across larger data sets while achieving higher-accuracy results. “By providing the first distributed file system with a TensorFlow plug-in, we are ensuring as much as a 30 percent faster throughput performance improvement for ML training workflows, helping companies better meet their business objectives through improved operational efficiency,” said Bjorn Kolbeck, Quobyte CEO.

Accelerating TensorFlow with RDMA for High-Performance Deep Learning

Xiaoyi Lu from Ohio State University gave this talk at the 2019 OpenFabrics Workshop in Austin. “Google’s TensorFlow is one of the most popular Deep Learning (DL) frameworks. We propose a unified way of achieving high performance through enhancing the gRPC runtime with Remote Direct Memory Access (RDMA) technology on InfiniBand and RoCE. Through our proposed RDMAgRPC design, TensorFlow only needs to run over the gRPC channel and gets the optimal performance.”

Announcing Google’s New TPU Dev Board for Machine Learning on the Edge

Google just launched Coral, a Beta platform for building intelligent devices with local AI. To enable this initiative, Google is making an edge version of its TensorFlow Processing Unit available for sale for the first time. “Our first hardware components feature the new Edge TPU, a small ASIC designed by Google that provides high-performance ML inferencing for low-power devices. For example, it can execute state-of-the-art mobile vision models such as MobileNet V2 at 100+ fps, in a power efficient manner.”

Video: TensorFlow for HPC?

In this podcast, Peter Braam looks at how TensorFlow framework could be used to accelerate high performance computing. “Google has developed TensorFlow, a truly complete platform for ML. The performance of the platform is amazing, and it begs the question if it will be useful for HPC in a similar manner that GPU’s heralded a revolution.”

Using Ai to Automatically Diagnose Alzheimer’s Disease

Researchers from Stanford University have developed a deep learning based system that can automatically detect Alzheimer’s disease and its biomarkers from MRIs, with 94 percent accuracy. “Our method uses minimal preprocessing of MRIs (imposing minimum preprocessing artifacts) and utilizes a simple data augmentation strategy of downsampled MR images for training purposes,” the researchers stated in their paper.

Exploiting HPC Technologies for Accelerating Big Data Processing and Associated Deep Learning

DK Panda from Ohio State University gave this talk at the Swiss HPC Conference. “This talk will provide an overview of challenges in accelerating Hadoop, Spark, and Memcached on modern HPC clusters. An overview of RDMA-based designs for Hadoop (HDFS, MapReduce, RPC and HBase), Spark, Memcached, Swift, and Kafka using native RDMA support for InfiniBand and RoCE will be presented. Enhanced designs for these components to exploit NVM-based in-memory technology and parallel file systems (such as Lustre) will also be presented.”

Video: IBM Sets Record TensorFlow Performance with new Snap ML Software

In this video, researchers from IBM Research in Zurich describe how the new IBM Snap Machine Learning (Snap ML) software was able to achieve record performance running TesorFlow. “This training time is 46x faster than the best result that has been previously reported, which used TensorFlow on Google Cloud Platform to train the same model in 70 minutes.”

Google Cloud TPU Machine Learning Accelerators now in Beta

John Barrus writes that Cloud TPUs are available in beta on Google Cloud Platform to help machine learning experts train and run their ML models more quickly. “Cloud TPUs are a family of Google-designed hardware accelerators that are optimized to speed up and scale up specific ML workloads programmed with TensorFlow. Built with four custom ASICs, each Cloud TPU packs up to 180 teraflops of floating-point performance and 64 GB of high-bandwidth memory onto a single board.

High Performance Inferencing with TensorRT

Chris Gottbrath from NVIDIA gave this talk at SC17 in Denver. “This talk will introduce the TensorRT Programmable Inference Accelerator which enables high throughput and low latency inference on clusters with NVIDIA V100, P100, P4 or P40 GPUs. TensorRT is both an optimizer and runtime – users provide a trained neural network and can easily creating highly efficient inference engines that can be incorporated into larger applications and services.”

Scaling Deep Learning Algorithms on Extreme Scale Architectures

Abhinav Vishnu from PNNL gave this talk at the MVAPICH User Group. “Deep Learning (DL) is ubiquitous. Yet leveraging distributed memory systems for DL algorithms is incredibly hard. In this talk, we will present approaches to bridge this critical gap. Our results will include validation on several US supercomputer sites such as Berkeley’s NERSC, Oak Ridge Leadership Class Facility, and PNNL Institutional Computing.”