Google Cloud TPU Pods Speed Machine Learning

Print Friendly, PDF & Email

Today Google announced that its Google Cloud TPU Pods are now publicly available in beta. Designed to help Machine Learning researchers iterate faster and train more capable machine learning models, TPU Pods can include more than 1,000 individual TPU chips connected by an ultra-fast, two-dimensional toroidal mesh network.

To accelerate the largest-scale machine learning (ML) applications deployed today and enable rapid development of the ML applications of tomorrow, Google created custom silicon chips called Tensor Processing Units (TPUs). When assembled into multi-rack ML supercomputers called Cloud TPU Pods, these TPUs can complete ML workloads in minutes or hours that previously took days or weeks on other systems.

While some custom silicon chips can only perform a single function, TPUs are fully programmable, which means that Cloud TPU Pods can accelerate a wide range of state-of-the-art ML workloads, including many of the most popular deep learning models. For example, a Cloud TPU v3 Pod can train ResNet-50 (image classification) from scratch on the ImageNet dataset in just two minutes or BERT (NLP) in just 76 minutes.

Cloud TPU customers see significant speed-ups in workloads spanning visual product search, financial modeling, energy production, and other areas. In a recent case study, Recursion Pharmaceuticals iteratively tests the viability of synthesized molecules to treat rare illnesses. What took over 24 hours to train on their on-prem cluster completed  in only 15 minutes on a Cloud TPU Pod.

A single Cloud TPU Pod can include more than 1,000 individual TPU chips which are connected by an ultra-fast, two-dimensional toroidal mesh network. The TPU software stack uses this mesh network to enable many racks of machines to be programmed as a single, giant ML supercomputer via a variety of flexible, high-level APIs.

The latest-generation Cloud TPU v3 Pods are liquid-cooled for maximum performance, and each one delivers more than 100 petaFLOPs of computing power. In terms of raw mathematical operations per second, a Cloud TPU v3 Pod is comparable with a top 5 supercomputer worldwide (though it operates at lower numerical precision).

It’s also possible to use smaller sections of Cloud TPU Pods called “slices.” We often see ML teams develop their initial models on individual Cloud TPU devices (which are generally available) and then expand to progressively larger Cloud TPU Pod slices via both data parallelism and model parallelism to achieve greater training speed and model scale.

Sign up for our insideHPC Newsletter