Extreme Scale Deep Learning at NERSC

Print Friendly, PDF & Email

In this video from PASC18, Thorsten Kurth from Lawrence Berkeley National Laboratory presents: Extreme Scale Deep Learning at NERSC.

“We present various studies on very large scale distributed deep learning on HPC systems including the ~10k node Intel Xeon-Phi-based Cori system at NERSC. We explore CNN classification architectures and generative adversarial networks for HEP problems using large images corresponding to full LHC detectors and high-resolution cosmology convergence maps. We have explored distributed scaling in different deep-learning frameworks, including Caffe, TensorFlow and PyTorch with different communication layers, i.e. Google RPC or MPI-based approaches such as Intel MLSL, Uber Horovod and Cray’s CPE ML Plugin. We describe various approaches for scaling out the training of single models up to the full Cori system. We further discuss recent work contrasting performance with different frameworks, systems and system architectures.”

Thorsten Kurth is working with the application readiness team to deliver optimized codes for Cori. He also acts as Liaison for defining and demonstrating application portability between the three major US HPC locations, i.e. NERSC, ALCF and OLCF. Before joining NERSC, Thorsten was working as a Postdoc in the Nuclear Science Division at LBNL. He developed and optimized codes for computing multi-baryon correlations in Lattice QCD. He received his PhD from the University of Wuppertal, Germany, in 2011, where he performed calculations for electroweak matrix elements, hadron and quark masses in Lattice QCD. Thorsten’s further interests are machine learning and data analysis.

See more talks in the PASC18 Video Gallery

Check out our insideHPC Events Calendar