In this video from the GPU Technology Conference, Thorsten Kurth from Lawrence Berkeley National Laboratory and Josh Romero from NVIDIA present: Exascale Deep Learning for Climate Analytics.
We’ll discuss how we scaled the training of a single deep learning model to 27,360 V100 GPUs (4,560 nodes) on the OLCF Summit HPC System using the high-productivity TensorFlow framework. We discuss how the neural network was tweaked to achieve good performance on the NVIDIA Volta GPUs with Tensor Cores and what further optimizations were necessary to provide excellent scalability, including data input pipeline and communication optimizations, as well as gradient boosting for SGD-type solvers. Scalable deep learning becomes more and more important as datasets and deep learning models grow and become more complicated. This talk is targeted at deep learning practitioners who are interested in learning what optimizations are necessary for training their models efficiently at massive scale.”
Thorsten Kurth is working with the application readiness team to deliver optimized codes for Cori. He also acts as Liaison for defining and demonstrating application portability between the three major US HPC locations, i.e. NERSC, ALCF and OLCF. Before joining NERSC, Thorsten was working as a Postdoc in the Nuclear Science Division at LBNL. He developed and optimized codes for computing multi-baryon correlations in Lattice QCD. He received his PhD from the University of Wuppertal, Germany, in 2011, where he performed calculations for electroweak matrix elements, hadron and quark masses in Lattice QCD. Thorsten’s further interests are machine learning and data analysis.
Josh Romero is a Technology Developer Engineer at NVIDIA with the HPC Software and Benchmarks group.
Learn more about Deep Learning at Berkeley Lab