“AMD has been away from the HPC space for a while, but now they are coming back in a big way with an open software approach to GPU computing. The Radeon Open Compute Platform (ROCm) was born from the Boltzmann Initiative announced last year at SC15. Now available on GitHub, the ROCm Platform bringing a rich foundation to advanced computing by better integrating the CPU and GPU to solve real-world problems.”
Over at the Nvidia Blog, Jamie Beckett writes that the company’s is expanding its Deep Learning Institute with Microsoft and Coursera. The institute provides training to help people apply deep learning to solve challenging problems.
Nvidia’s GPU platforms have been widely used on the training side of the Deep Learning equation for some time now. Today the company announced a new Pascal-based GPU tailor-made for the inferencing side of Deep Learning workloads. “With the Tesla P100 and now Tesla P4 and P40, NVIDIA offers the only end-to-end deep learning platform for the data center, unlocking the enormous power of AI for a broad range of industries,” said Ian Buck, general manager of accelerated computing at NVIDIA.”
A consortium of European researchers and technology companies recently completed the EU-funded SAVE project, aimed at simplifying the execution data-intensive applications on complex hardware architectures. Funded by the European Commission’s Seventh Framework Programme (FP7), the project was launched in 2013, under the project name ‘Self-Adaptive Virtualization-Aware High-Performance/Low-Energy Heterogeneous System Architectures’ (SAVE). The project, which was completed at the start of this month, has led to innovations in hardware, software and operating system (OS) components.
Supermicro’s density optimized 4U SuperServer 4028GR-TR(T)2 supports up to 10 PCI-E Tesla P100 accelerators for up to 210 TFLOPS FP16 peak performance with GPU Direct RDMA support. Supermicro’s innovative and GPU optimized single root complex PCI-E design is proven to dramatically improve GPU peer-to-peer communication efficiency over QPI and PCI-E links, with up to 21% higher QPI throughput and 60% lower latency compared to previous generation products. These 4U SuperServers support dual Intel Xeon processor E5-2600 v4/v3 product families, up to 3TB DDR4-2400MHz memory, optional dual onboard 10GBase-T ports, and redundant Titanium Level (96%) digital power supplies.
Today One Stop Systems (OSS) announced that its High Density Compute Accelerator (HDCA) and its Express Box 3600 (EB3600) are now available for purchase with the NVIDIA Tesla P100 for PCIe GPU. These high-density platforms deliver teraflop performance with greatly reduced cost and space requirements. The HDCA supports up to 16 Tesla P100s and the EB3600 supports up to 9 Tesla P100s. The Tesla P100 provides 4.7 TeraFLOPS of double-precision performance, 9.3 TeraFLOPS of single-precision performance and 18.7 TeraFLOPS of half-precision performance with NVIDIA GPU BOOST technology.
The Piz Daint supercomputer at the Swiss National Supercomputing Centre (CSCS) is again assisting researchers in competition for the prestigious Gordon Bell prize. “Researchers led by Peter Vincent from Imperial College London have made this year’s list of finalists for the Gordon Bell prize, with the backing of Piz Daint at the Swiss National Supercomputing Centre. The prize is awarded annually in November at SC, the world’s largest conference on supercomputing. It honors the success of scientists who are able to achieve very high efficiencies for their research codes running on the fastest supercomputer architectures currently available.”
“We have enhanced Bright Cluster Manager 7.3 so our customers can quickly and easily deploy new deep learning techniques to create predictive applications for fraud detection, demand forecasting, click prediction, and other data-intensive analyses,” said Martijn de Vries, Chief Technology Officer of Bright Computing. “Going forward, customers using Bright to deploy and manage clusters for deep learning will not have to worry about finding, configuring, and deploying all of the dependent software components needed to run deep learning libraries and frameworks.”
Thomas Schulthess presented this talk at the MVAPICH User Group. “Implementation of exascale computing will be different in that application performance is supposed to play a central role in determining the system performance, rather than just considering floating point performance of the high-performance Linpack benchmark. This immediately raises the question as to what the yardstick will be, by which we measure progress towards exascale computing. I will discuss what type of performance improvements will be needed to reach kilometer-scale global climate and weather simulations. This challenge will probably require more than exascale performance.”