Podcast: A Retrospective on Great Science and the Stampede Supercomputer

Print Friendly, PDF & Email

TACC will soon deploy Phase 2 of the Stampede II supercomputer. In this podcast, they celebrate by looking back on some of the great science computed on the original Stampede machine.

In 2017, the Stampede supercomputer, funded by the National Science Foundation, completed its five-year mission to provide world-class computational resources and support staff to more than 11,000 U.S. users on over 3,000 projects in the open science community. But what made it special? Stampede was like a bridge that moved thousands of researchers off of soon-to-be decommissioned supercomputers, while at the same time building a framework that anticipated the eminent trends that came to dominate advanced computing.

Change was in the air at the National Science Foundation (NSF) in 2010, two years into the operation of the soon-to-be retired Ranger supercomputer of the Texas Advanced Computing Center (TACC). Ranger represented a new class of cutting-edge computing systems designed specifically for getting more people — U.S. researchers from all fields of science and engineering — to use them. Ranger and a few other systems of the NSF-funded Teragrid cyberinfrastructure, such as Kraken at the National Institute for Computational Sciences at UT Knoxville, were going to come offline in the next few years.

Supercomputers live fast and retire young. That’s because technology advances quickly and each generation of computer processors is significantly faster, and cheaper to operate, than the one before it. Expectations were high for the successor to Ranger, a system called Stampede built by TACC that proved to be 20 times more powerful than Ranger and only used half the electricity.

We knew, as we were designing Stampede that we had to inherit a huge amount of workload from the systems that were going offline,” said Dan Stanzione, executive director of TACC and the principal investigator of the Stampede project. “And at the same time, you could see that architectural changes were coming, and we had to move the community forward as well. That was going to be a huge challenge,” Stanzione said.

The challenge was and still is to match the breakneck speed of change in computer hardware and architectures. With Ranger, one fundamental architectural change was going to four, four-core processors on a computer node. “It was clear that this trend was going to continue,” Stanzione said.

This trend toward “manycore” processors, as they are known, would force changes to the programming models that researchers use to develop application software for high-tech hardware. Since scientific software changes its structure much more slowly than hardware, sometimes over the course of years, it was critical to get researchers started down the road to manycore.

We needed to take on this enormous responsibility of all of the old workload that was out there for all of the systems that were retiring, but at the same time start encouraging people to modernize and go towards what we thought systems were going to look like in the future,” Stanzione said. “It was an exciting time.”

Designing the Stampede supercomputer required foresight and awareness of the risks in planning a multi-million dollar computing project that would run seven years into the future. Stanzione and the team at TACC wrote the proposal in 2010 based on hardware — the Intel Xeon E5 (Sandy Bridge) processor and Intel Xeon Phi co-processor, as well as the Dell servers — that were being developed but didn’t yet exist. TACC deployed Stampede on schedule in 2013 and consistently met and exceeded its proposed goals of providing to the open science community the computing power of 10 petaflops. An upgrade in 2016 added Knights Landing processors — a standalone processor released by Intel that year— and 1.5 petaflops to the system. What’s more, TACC operated a world-class facility to support, educate, and train users in fully using Stampede.

One of the things that I’m proud of is that we’ve been able to execute both on time and on budget. We delivered the exact system we had forecast,” Stanzione said.

NSF awarded The University of Texas at Austin $51.5 million for TACC to deploy and support the Stampede supercomputer, which included a hardware upgrade in 2016. During its five years of operations, Stampede completed more than eight million successful computing jobs and clocked over three billion core hours of computation.

Download the MP3

Read the Full Story

Sign up for our insideHPC Newsletter