Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


MIT Professor Runs Record Google Compute Engine job with 220K Cores

Andrew Sutherland of MIT

Over at the Google Blog, Alex Barrett writes that an MIT math professor recently broke the record for the largest-ever Compute Engine cluster, with 220,000 cores on Preemptible VMs. According to Google, this is the largest known HPC cluster to ever run in the public cloud.

Andrew V. Sutherland is a computational number theorist and Principal Research Scientist at MIT, and is using Compute Engine to explore generalizations of the Sato-Tate Conjecture and the conjecture of Birch and Swinnerton-Dyer to curves of higher genus. In his latest run, he explored 1017 hyperelliptic curves of genus 3 in an effort to find curves whose L-functions can be easily computed, and which have potentially interesting Sato-Tate distributions. This yielded about 70,000 curves of interest, each of which will eventually have its own entry in the L-functions and Modular Forms Database (LMFDB).

The flexibility of the Google Compute Engine was key for Sutherland in his quest to explore generalizations of the Sato-Tate Conjecture and the conjecture of Birch and Swinnerton-Dyer to curves of higher genus. In his latest run, he explored 1017 hyperelliptic curves of genus 3 in an effort to find curves whose L-functions can be easily computed, and which have potentially interesting Sato-Tate distributions. This yielded about 70,000 curves of interest, each of which will eventually have its own entry in the L-functions and Modular Forms Database (LMFDB).

Sutherland first ran jobs on his own 64-core machine, which could take months, or wrangled for compute time on one of MIT’s clusters. But getting the number of cores he needed often raised eyebrows, and he was limited by the software configurations he could use. By running on Compute Engine, Sutherland can install exactly the operating system, libraries and applications he needs, and thanks to root access, he can update his environment at will.”

Sutherland considered running his jobs on AWS before choosing Google but was dissuaded by its Spot Instances model, which forces you to name your price up front, with prices that can vary significantly by region and fluctuate over time. A colleague encouraged him to try Compute Engine Preemptible VMs. These are full-featured instances that are priced up to 80% less than regular equivalents, but can be interrupted by Compute Engine. That was fine with Sutherland. His computations are embarrassingly parallel — they can be easily separated into multiple, independent tasks — and he grabs available instances across any and all Google Cloud Regions. An average of about 2-3% of his instances are typically preempted in any given hour, but a simple script automatically restarts them as needed until the whole job is complete.”

For Sutherland’s next next run, he hopes to expand his search to non-hyperelliptic curves of genus 3, breaking his own record with a 400,000-core cluster.

It changes your whole outlook on research when you can ask a question and get an answer in hours rather than months,” he said. “You ask different questions.”

Sign up for our insideHPC Newsletter

Resource Links: