Deep Learning and Automatic Differentiation from Theano to PyTorch

Print Friendly, PDF & Email

In this video from CSCS-ICS-DADSi Summer School, Atilim Güneş Baydin presents: Deep Learning and Automatic Differentiation from Theano to PyTorch.

Inquisitive minds want to know what causes the universe to expand, how M-theory binds the smallest of the small particles or how social dynamics can lead to revolutions. In recent centuries, developments in science and technology brought us closer to explore the expanding universe, discover unknown particles like bosons or find out how and why a society interacts and reacts. To explain the fascinating phenomena of nature, Natural scientists develop complex ‘mechanistic models’ of deterministic or stochastic nature. But the hard question is how to choose the best model for our data or how to calibrate the model given the data.

The way that statisticians answer these questions is with Approximate Bayesian Computation (ABC), which we learn on the first day of the summer school and which we combine with High Performance Computing. The second day focuses on a popular machine learning approach ‘Deep-learning’ which mimics the deep neural network structure in our brain, in order to predict complex phenomena of nature. The summer school takes a route of open discussion and brainstorming sessions where we explore two cornerstones of today’s data-science, ABC and Deep Learning being accelerated by HPC with hands on examples and exercises.

We are ready to start with you a journey towards unveiling the mysteries of nature, sharing and integrating ideas from ABC and Deep Learning.

Sign up for our insideHPC Newsletter


  1. Peter Foelsche says

    I’m an AD expert since around 2000. I did my first forward AD implementation with C++ in 2002 (dual numbers). This procedure got optimized since then — and it is still being optimized. The problem of different places in the code depending on a different subset of independent variables has been solved (by meta programming) in around 2009. I also exploit mixing chain-rule with normal forward differentiation to minimize the number of derivatives being carried. From my experience dual numbers results in much better performance than source code transformation for a variety of reasons. For higher order derivatives I’m using truncated Taylor series since a couple of years — in C++. I’m very curious that people today (2017) are using python to perform automatic differentiation! What a waste of CPU time.