In this video from the Intel HPC Developer Conference, Mark O’Connor from Allinea Software describes how the company’s performance optimizations tools can speed up machine learning code.
“The majority of deep learning frameworks provide good out-of-the-box performance on a single workstation, but scaling across multiple nodes is still a wild, untamed borderland. This session follows the story of one researcher trying to make use of a significant compute resource to accelerate learning over a large number of CPUs. Along the way we note how to find good multiple-CPU performance with Theano* and TensorFlow*, how to extend a single-machine model with MPI and optimize its performance as we scale out and up on both Intel Xeon and Intel Xeon Phi architectures. Finally, we address the greatest question of our time: how many CPUs does it take to learn Atari games faster than a 7 year-old child?”
See Machine Learning videos from the Intel HPC Developer Conference