Epic HPC Road Trip stops at NERSC for a look at Big Network and Storage Challenges

Print Friendly, PDF & Email

In this special guest feature, Dan Olds from OrionX continues his Epic HPC Road Trip series with a stop at NERSC.

Dan Olds from OrionX hits the road for SC18 in Dallas.

My next to last road trip stop was at Lawrence Berkeley National Lab where we talked to primarily NERSC (National Energy Research Scientific Computing) folks. Located high above Berkeley California, the lab has a view that can’t be beat.

NERSC is unusual in that they receive more data than they send out. Client agencies process their raw data on NERSC systems and then export the results to their own organizations. This puts a lot of pressure on storage and network I/O, making them top priority at NERSC.

First up on the video is an interview with Glenn Lockwood, HPC Performance Engineer in the NERSC Advanced Technologies Group. He specializes in parallel I/O and large-scale storage systems and has a resume of journal articles, papers and presentations as long as your arm. In simple terms, as he said in the video, “I’m tasked with figuring out what the future storage systems of the world will look like.”

In the video, he talks about the struggle to increase IOPs, particularly when it comes to today’s new workloads, like large-scale astronomy observations. According to Glenn, there usage pattern completely different than what they’d traditionally see with simulation data. The astronomy folks don’t necessarily know what they’ll need to save and what they can discard. They look at bits of data all over the array in order to see what’s significant and needs further processing.

Because of these factors, their next supercomputer, Perlmutter, will be completely based on flash storage. Glenn told us that Perlmutter will be the first supercomputer designed from the ground up to support both data intensive and traditional simulation/modeling workloads.

Our next interview was with Eli Dart, Network Engineer for Energy Sciences Network (ESN). The ESN is a national-scale fiber optic network that connects all of the DOE national lab complex to each other and the outside world. Eli also has a very long list of accomplishments in terms of published papers, reports, and journal articles.

Eli has a pretty big job. In terms of aggregate volume, ESN moves something like 80 PB per month. Speed is also pretty sporty, with good conditions, they can transfer multiple gigabytes per second coast to coast. This type of speed and bandwidth makes the labs much more efficient due to the network effect. And it’s engineers like Eli who make it happen.

The biggest challenges include scalability – the ability to allow scientists to handle and process megabytes, gigabytes, terabytes, and even petabytes of data without having to expend any more effort due to the size of the dataset. The key to making this work is matching up data with compute and vice versa. In order to do this, you need a highly flexible network infrastructure that can scale almost exponentially to keep up with the workloads.

When asked “What keeps you up at night”, Eli responded simply “Falling behind.” But what gets him up in the morning is the results of the science that his network enables in terms of helping humanity and pushing the frontiers of science.

Eli also gave us a live look at the ESnet network overview of how the network is being utilized at the moment. Very cool.

Many thanks go out to Cray for sponsoring this journey. With NERSC in the bag, Dan has one more stop down the road at LLNL.

Dan Olds is an Industry Analyst at OrionX.net. An authority on technology trends and customer sentiment, Dan Olds is a frequently quoted expert in industry and business publications such as The Wall Street Journal, Bloomberg News, Computerworld, eWeek, CIO, and PCWorld. In addition to server, storage, and network technologies, Dan closely follows the Big Data, Cloud, and HPC markets. He writes the HPC Blog on The Register, co-hosts the popular Radio Free HPC podcast, and is the go-to person for the coverage and analysis of the supercomputing industry’s Student Cluster Challenge.

Sign up for our insideHPC Newsletter