Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Slidecast: For AMD, It’s Time to ROCm!

“AMD has been away from the HPC space for a while, but now they are coming back in a big way with an open software approach to GPU computing. The Radeon Open Compute Platform (ROCm) was born from the Boltzmann Initiative announced last year at SC15. Now available on GitHub, the ROCm Platform bringing a rich foundation to advanced computing by better integrating the CPU and GPU to solve real-world problems.”

Nvidia Expands Deep Learning Institute

Over at the Nvidia Blog, Jamie Beckett writes that the company’s is expanding its Deep Learning Institute with Microsoft and Coursera. The institute provides training to help people apply deep learning to solve challenging problems.

Nvidia Unveils World’s First GPU Design for Inferencing

Nvidia’s GPU platforms have been widely used on the training side of the Deep Learning equation for some time now. Today the company announced a new Pascal-based GPU tailor-made for the inferencing side of Deep Learning workloads. “With the Tesla P100 and now Tesla P4 and P40, NVIDIA offers the only end-to-end deep learning platform for the data center, unlocking the enormous power of AI for a broad range of industries,” said Ian Buck, general manager of accelerated computing at NVIDIA.”

European SAVE Project Streamlines Data Intensive Computing

A consortium of European researchers and technology companies recently completed the EU-funded SAVE project, aimed at simplifying the execution data-intensive applications on complex hardware architectures. Funded by the European Commission’s Seventh Framework Programme (FP7), the project was launched in 2013, under the project name ‘Self-Adaptive Virtualization-Aware High-Performance/Low-Energy Heterogeneous System Architectures’ (SAVE). The project, which was completed at the start of this month, has led to innovations in hardware, software and operating system (OS) components.

Supermicro Shipping Servers with NVIDIA Tesla P100 GPUs

Supermicro’s density optimized 4U SuperServer 4028GR-TR(T)2 supports up to 10 PCI-E Tesla P100 accelerators for up to 210 TFLOPS FP16 peak performance with GPU Direct RDMA support. Supermicro’s innovative and GPU optimized single root complex PCI-E design is proven to dramatically improve GPU peer-to-peer communication efficiency over QPI and PCI-E links, with up to 21% higher QPI throughput and 60% lower latency compared to previous generation products. These 4U SuperServers support dual Intel Xeon processor E5-2600 v4/v3 product families, up to 3TB DDR4-2400MHz memory, optional dual onboard 10GBase-T ports, and redundant Titanium Level (96%) digital power supplies.

One Stop Systems Shipping Platforms with NVIDIA Tesla P100 for PCIe

Today One Stop Systems (OSS) announced that its High Density Compute Accelerator (HDCA) and its Express Box 3600 (EB3600) are now available for purchase with the NVIDIA Tesla P100 for PCIe GPU. These high-density platforms deliver teraflop performance with greatly reduced cost and space requirements. The HDCA supports up to 16 Tesla P100s and the EB3600 supports up to 9 Tesla P100s. The Tesla P100 provides 4.7 TeraFLOPS of double-precision performance, 9.3 TeraFLOPS of single-precision performance and 18.7 TeraFLOPS of half-precision performance with NVIDIA GPU BOOST technology.

New OpenPOWER Servers Accelerate Deep Learning with NVLink

Today IBM unveiled a series of new servers designed to help propel cognitive workloads and to drive greater data center efficiency. Featuring a new chip, the Linux-based lineup incorporates innovations from the OpenPOWER community that deliver higher levels of performance and greater computing efficiency than available on any x86-based server. “Collaboratively developed with some of the world’s leading technology companies, the new Power Systems are uniquely designed to propel artificial intelligence, deep learning, high performance data analytics and other compute-heavy workloads, which can help businesses and cloud service providers save money on data center costs.”

Powering Aircraft CFD with the Piz Daint Supercomputer

The Piz Daint supercomputer at the Swiss National Supercomputing Centre (CSCS) is again assisting researchers in competition for the prestigious Gordon Bell prize. “Researchers led by Peter Vincent from Imperial College London have made this year’s list of finalists for the Gordon Bell prize, with the backing of Piz Daint at the Swiss National Supercomputing Centre. The prize is awarded annually in November at SC, the world’s largest conference on supercomputing. It honors the success of scientists who are able to achieve very high efficiencies for their research codes running on the fastest supercomputer architectures currently available.”

New Bright for Deep Learning Solution Designed for Business

“We have enhanced Bright Cluster Manager 7.3 so our customers can quickly and easily deploy new deep learning techniques to create predictive applications for fraud detection, demand forecasting, click prediction, and other data-intensive analyses,” said Martijn de Vries, Chief Technology Officer of Bright Computing. “Going forward, customers using Bright to deploy and manage clusters for deep learning will not have to worry about finding, configuring, and deploying all of the dependent software components needed to run deep learning libraries and frameworks.”

Exascale Computing – What are the Goals and the Baseline?

Thomas Schulthess presented this talk at the MVAPICH User Group. “Implementation of exascale computing will be different in that application performance is supposed to play a central role in determining the system performance, rather than just considering floating point performance of the high-performance Linpack benchmark. This immediately raises the question as to what the yardstick will be, by which we measure progress towards exascale computing. I will discuss what type of performance improvements will be needed to reach kilometer-scale global climate and weather simulations. This challenge will probably require more than exascale performance.”