In this podcast, the Radio Free HPC team recaps the ASC16 Student Cluster Competition in China and the 2016 MSST Conference in Santa Clara. Dan spent a week in Wuxi interviewing ASC16 student teams, he came back impressed with the Linpack benchmark tricks from the team at Zhejiang University, who set a new student LINPACK record with 12.03 TFlop/s. Meanwhile, Rich was in Santa Clara for the MSST conference, where he captured two days of talks on Mass Storage Technologies.
In this video from the GPU Hackathon at the University of Delaware, attendees tune their code to accelerate their application performance. The 5-day intensive GPU programming Hackathon was held in collaboration with Oak Ridge National Lab (ORNL). “Thanks to a partnership with NASA Langley Research Center, Oak Ridge National Laboratory, National Cancer Institute, National Institutes of Health (NIH), Brookhaven National Laboratory and the UD College of Engineering, UD students had access to the world’s second largest supercomputer — the Titan — to help solve real-world problems.”
Over at the Nvidia Blog, George Millington writes that, the fourth consecutive year, the Nvidia Tesla Accelerated Computing Platform helped set new milestones in the Asia Student Supercomputer Challenge, the world’s largest supercomputer competition.
In this video from the 2016 GPU Technology Conference, Greg Schmidt from Hewlett Packard Enterprise describes the new Apollo 6500 server. “With up to eight high performance NVIDIA GPU cards designed for maximum transfer bandwidth, the HPE Apollo 6500 System is purpose-built for deep learning applications. Its high ratio of GPUs to CPUs, dense 4U form factor and efficient design enable organizations to run deep learning recommendation algorithms faster and more efficiently, significantly reducing model training time and accelerating the delivery of real-time results, all while controlling costs.”
In this video from the 2016 GPU Technology Conference, Rich Friedrich from Hewlett Packard Enterprise describes how the company makes it easier for Data Scientists to program GPUs. “In April, HPE announced a public, open-source version of the platform called the Cognitive Computing Toolkit. Instead of relying on the traditional CPUs that power most computers, the Toolkit runs on graphics processing units (GPUs), inexpensive chips designed for video game applications.”
Gregory Stoner from AMD presented this talk at the HPC User Forum. “With the announcement of the Boltzmann Initiative and the recent releases of ROCK and ROCR, AMD has ushered in a new era of Heterogeneous Computing. The Boltzmann initiative exposes cutting edge compute capabilities and features on targeted AMD/ATI Radeon discrete GPUs through an open source software stack. The Boltzmann stack is comprised of several components based on open standards, but extended so important hardware capabilities are not hidden by the implementation.”
The NVIDIA DGX-1 features up to 170 teraflops of half precision (FP16) peak performance, 8 Tesla P100 GPU accelerators with 16GB of memory per GPU, 7TB SSD DL Cache, and a NVLink Hybrid Cube Mesh. Packaged with fully integrated hardware and easily deployed software, it is the world’s first system built specifically for deep learning and with NVIDIA’s revolutionary, Pascal-powered Tesla P100 accelerators, interconnected with NVIDIA’s NVLink. NVIDIA designed the DGX-1 to meet the never-ending computing demands of artificial intelligence and claims it can provide the throughput of 250 CPU-based servers delivered via a single box.
In this podcast, the Radio Free HPC team recaps the GPU Technology Conference, which wrapped up last week in San Jose.
Since Rich is traveling around in some desert somewhere, Dan and Henry go it alone and discuss the new Pascal (P1000) GPU, NVIDIA’s new server, and what happened at the concurrent OpenPOWER conference.”
“Cavium ThunderX has significant differentiation in the 64-bit ARM market as Cavium is the first ARMv8 vendor to deliver dual socket support with full ARMv8.1 implementation and significant advantage in CPU cores with 48 cores per socket. In addition, ThunderX supports large memory capacity (512GB per socket, 1TB in a 2S system) with excellent memory bandwidth and low memory latency. In addition, ThunderX includes multiple 10 GbE / 40GbE network interfaces delivering excellent IO throughput. These features enable ThunderX to deliver the core performance & scale out capability that the HPC market requires.”
In this video from the HPC User Forum in Tucson, Earl Joseph from IDC presents: 2016 IDC HPC Market Update. “The HPC User Forum was established in 1999 to promote the health of the global HPC industry and address issues of common concern to users. The organization has since grown to 150 members.”