“Deep learning developers and researchers want to train neural networks as fast as possible. Right now we are limited by computing performance,” said Dr. Diamos. “The first step in improving performance is to measure it, so we created DeepBench and are opening it up to the deep learning community. We believe that tracking performance on different hardware platforms will help processor designers better optimize their hardware for deep learning applications.”
Oak Ridge National Lab is hosting a 3-day GPU Mini-hackathon led by experts from the OLCF and Nvidia. The event takes place Nov. 1-3 in Knoxville, Tennessee. “General-purpose Graphics Processing Units (GPGPUs) potentially offer exceptionally high memory bandwidth and performance for a wide range of applications. The challenge in utilizing such accelerators has been the difficulty in programming them. This event will introduce you to GPU programming techniques.”
This week Minimal Metrics announced an early-adopter program for PerfMiner, which uses lightweight, and pervasive performance data collection technology, automates its collection, and mines the data for key performance indicators. These indicators were developed through Minimal Metrics’ extensive experience tuning HPC and enterprise application performance, presented in an audience-specific, drill-down hierarchy that provides accountability for site productivity down to the performance of individual application threads.
In this RCE Podcast, Brock Palen and Jeff Squyres speak with Gregory Kurtzer about Singularity, a container solution for HPC and research environments. “Singularity allows a non-privileged user to “swap out” the operating system on the host for one they control. So if the host system is running RHEL6 but your application runs in Ubuntu, you can create an Ubuntu image, install your applications into that image, copy the image to another host, and run your application on that host in it’s native Ubuntu environment.”
“PSyclone was developed for the UK Met Office and is now a part of the build system for Dynamo, the dynamical core currently in development for the Met Office’s ‘next generation’ weather and climate model software. By generating the complex code needed to make use of thousands of processors, PSyclone leaves the Met Office scientists free to concentrate on the science aspects of the model. This means that they will not have to change their code from something that works on a single processing unit (or core) to something that runs on many thousands of cores.”
“PushToCompute is the easiest and most advanced DevOps pipeline for high performance applications available today”, said Nimbix CTO Leo Reiter. “It seamlessly enables serverless computing of even the most complex workflows, greatly simplifying application deployment at scale, and eliminating the need for any platform orchestration or user interface work. Developers simply focus on their specific functionality, rather than on building cloud capabilities into their applications.”
Today, the European Adept Project wrapped up by releasing a set of open-source energy measurement tools. “This is a significant step forward in understanding where exactly in a parallel computing system energy is consumed,” says Dr Michèle Weiland, Project Coordinator for Adept. “Giving both hardware and software developers access to this type of information allows them to make informed choices about new implementations, without guesswork.”
“AMD has been away from the HPC space for a while, but now they are coming back in a big way with an open software approach to GPU computing. The Radeon Open Compute Platform (ROCm) was born from the Boltzmann Initiative announced last year at SC15. Now available on GitHub, the ROCm Platform bringing a rich foundation to advanced computing by better integrating the CPU and GPU to solve real-world problems.”
Vectorization and threading are critical to using such innovative hardware product such as the Intel Xeon Phi processor. Using tools early in the design and development processor that identify where vectorization can be used or improved will lead to increased performance of the overall application. Modern tools can be used to determine what might be blocking compiler vectorization and the potential gain from the work involved.