“The majority of deep learning frameworks provide good out-of-the-box performance on a single workstation, but scaling across multiple nodes is still a wild, untamed borderland. This discussion follows the story of one researcher trying to make use of a significant compute resource to accelerate learning over a large number of CPUs. Along the way we note how to find good multiple-CPU performance with Theano* and TensorFlow*, how to extend a single-machine model with MPI and optimize its performance as we scale out and up on both Intel Xeon and Intel Xeon Phi architectures.”
Are supercomputers practical for Deep Learning applications? Over at the Allinea Blog, Mark O’Connor writes that a recent experiment with machine learning optimization on the Archer supercomputer shows that relatively simple models run at sufficiently large scale can readily outperform more complex but less scalable models. “In the open science world, anyone running a HPC cluster can expect to see a surge in the number of people wanting to run deep learning workloads over the coming months.”
“Being ready with full support for Intel Xeon Phi from day one has been a key strategy for Allinea and underpins our approach for supporting customers, such as Los Alamos National Laboratory on the Trinity system, Argonne National Laboratory on Theta and NERSC on Cori, where work is now underway to port code and get applications ready for more complex science on a larger scale.”
“Science problems are becoming increasingly complex in all areas from physics and bioinformatics to engineering,” said Siegfried Hoefinger, High Performance Computing Specialist at VSC explains. “Bigger is better, but inefficiency will always limit what you can achieve. The Allinea tools will enable us to quickly establish the root cause of bottlenecks and understand the markers for inefficient code. By doing so we’re helping to prove the case for modernization, can start to eliminate inefficiencies and exploit latent capacity to its full effect.”
Today Allinea Software announces availability of its new software release, version 6.1, which offers full support for programming parallel code on the Pascal GPU architecture, CUDA 8 from Nvidia. “The addition of Allinea tools into the mix is an exciting one, enabling teams to accurately measure GPU utilization, employ smart optimization techniques and quickly develop new CUDA 8 code that is bug and bottleneck free,” said Mark O’Connor, VP of Product Management at Allinea.
Today, Allinea announced that the company will be exhibiting at XSEDE16 July 17-21 in Miami. The conference will attract an audience across industry and academia to discuss the key themes of diversity, big data and science at scale. “Our tools are used extensively across the XSEDE user base so we’re delighted to be extending the value they bring by giving practical advice for getting the best out of infrastructure capabilities through software tuning, especially given the addition of support for the full Intel Xeon Phi family in our new v6.1 software release,” said Rob Rick, VP Americas for Allinea.”
“Our latest product enhancements will solidify our customers’ investment in the next generation Intel Xeon Phi processor,” said Mark O’Connor, VP Product Management at Allinea. “Knights Landing’ has the potential to unleash new capabilities for HPC code users and our new release brings a powerful debugger, profiler and performance reports for tackling the essential preparatory work needed to optimize legacy code and realize the processor’s true potential for reducing software run times.”
The Flemish Supercomputer Center (VSC) is planning the deployment of a new NEC cluster that will represent Belgium’s largest investment in HPC to date. To help VSC unleash the potential of the system, Allinea software tools will be used to speed up code performance. “We are delighted to be supporting VSC in providing better education to its users around code efficiency,” said David Lecomber, CEO and Founder of Allinea. “The fact of the matter is, without visibility of code performance, researchers cannot get the full value from HPC. By appreciating how their code makes a difference to project delivery, researchers can achieve more for less cost. By underlining this best practice, VSC’s approach is one that is refreshing and makes great economic sense.”
“Cavium ThunderX has significant differentiation in the 64-bit ARM market as Cavium is the first ARMv8 vendor to deliver dual socket support with full ARMv8.1 implementation and significant advantage in CPU cores with 48 cores per socket. In addition, ThunderX supports large memory capacity (512GB per socket, 1TB in a 2S system) with excellent memory bandwidth and low memory latency. In addition, ThunderX includes multiple 10 GbE / 40GbE network interfaces delivering excellent IO throughput. These features enable ThunderX to deliver the core performance & scale out capability that the HPC market requires.”
Allinea Software reports that the company is helping weather and climate researchers to adapt advanced weather models to better exploit today’s technology capability and get ready for future platforms. The company will address leading climatologists and meteorologists on best practices for scalable code development April 6-7 at the 4th ENES HPC Workshop. The session will reference the application of Allinea’s tools across over 20 weather and climate customers worldwide.