Podcast: Why HPC is Vital at NASA

Print Friendly, PDF & Email

In this podcast, Bill Thigpen describes why supercomputing is an indispensable tool at NASA.

“At NASA, what we’re doing is we’re really looking at ways of making aircraft more efficient. We’re trying to make them quieter. We’re trying to make them get to cruise altitude faster, which saves the taxpayer or the people who are using airplanes a lot of money, and we look at really complex problems. We look at things like rotorcraft. If you think about how that model looks, it’s a very complex model. What do we do with supercomputing? Pretty much everything across the board.”

Bill Thigpen is Branch Chief of Advanced Computing at NASA’s Ames Research Center in Silicon Valley.


Matthew Buffington (Host): Welcome to NASA in Silicon Valley, episode 48. This week we’re talking to Bill Thigpen, chief of the Advanced Computing Branch here at Ames. Bill manages high-end computing activities for NASA’s key system development efforts, like the Columbia and Pleiades supercomputers, some of the world’s fastest and most capable systems. We also talk about his day-to-day role to make sure that these systems are up and running and being used effectively by the scientists and engineers at NASA.  So, here is Bill Thigpen.

Host: Welcome, Bill. Tell us a little bit about yourself. How did you end up at NASA? What brought you to Silicon Valley?

Bill Thigpen: Oh, I’ve been playing with computers since the ’70s.

Host: Okay. This is a good place to be for that.

Bill Thigpen: Yeah, but I was in North Dakota.

Host: Okay. Not so good of a place for that. Well, you have time on your hands, though.

Bill Thigpen: Yeah. There was a lot of time on my hands, but I really fell in love with them. At that time, there really wasn’t a way to get a degree in computer science. It didn’t really exist. This was in high school. I left North Dakota and I went to the Air Force Academy for a while, and then I went to the University of Nebraska, and there I got into computer science. I was actually in the first graduating class with a computer science degree. There were six of us.

Host: Really?

Bill Thigpen: At the time, it was kind of buried underneath the math department. But the next semester there were a hundred, and the semester after that there were 200. So it was a real early time to become involved with it. I got really, really lucky, because right out of college I was hired by a company in Nebraska and I started writing parallel operating systems.

Host: Really?

Bill Thigpen: Yeah. That’s not a normal thing you get to do out of college.

Host: Out of college. I was going to say you’re in demand.

Bill Thigpen: Yeah.

Host: There probably wasn’t a lot of you to pick from.

Bill Thigpen: There weren’t many parallel computers around, so I lucked out and got that. When I was going to college, I got married, and my wife made me promise to get a job outside of Omaha. So the job I got was in Bellevue, which is like saying I’m going to get a job outside of San Jose, and you get a job in Santa Clara.

Host: It’s like, “Whew.”

Bill Thigpen: They touch.

Host: Yes.

Bill Thigpen: A couple years went by, and I got an offer to come out to California with the company that I was with, and I came out here and was working in really a lot of military stuff, so doing things, building communication systems, working on secure communication systems even got to do some stuff where I wrote programs that ran motors and radars and figured out where to put terminal Doppler weather radars. I actually got a license to drive a semi truck while I was doing that.

Host: Oh, like a CDL.

Bill Thigpen: Eventually, I ended up doing some work out at NASA. While I was here, I moved into supercomputing. So I started in these parallel computers and I ended up in parallel computers.

Host: You’ve always been on the cutting edge.

Bill Thigpen: Well, yeah, I have. It’s been real exciting. My first job at NASA was in 1999, and I got a job as a branch chief in the NASA advanced supercomputing division, and I’ve been there ever since. It’s been amazing. When I first got there, teraflop computing wasn’t around. You didn’t do a trillion floating-point operations on a single system.

During that time, I’ve been fortunate enough to be involved in the deployment of the Columbia supercomputer, which, at the time, for a very small period of time, it was the fastest computer in the U.S. About a week later Blue Gene came out, and it was the fastest, but it was the fastest production computer for about a year, and in 2004 it got an award as the most significant thing that happened in supercomputing.

Host: Oh, wow.

Bill Thigpen: I got to lead the effort building that. Then in 2008 I got to build Pleiades, which sort of dwarfed the Columbia system. Pleiades has gone from 500 teraflops to seven and a half petaflops, is where it’s sitting today.

Host: Thinking of the supercomputing, especially as, when you started at NASA in ’99, I’m thinking back, because the Apple II was already a thing. People had Windows 95. The home computer was becoming a thing, and then you’re working on the supercomputer. What makes the computer super? What’s the main difference?

Bill Thigpen: If you think of your home computer, it’s got a processor.

Host: Yeah. Ideally, yes. [Laughs]

Bill Thigpen: I mean, a chip. Sometimes it will have a couple of chips, but when you build a supercomputer, you take that model, and at least in the case of parallel computers, you take that model and you just duplicate it over and over and over and over again. If you look at the Pleiades computer, each rack has 72 nodes in it. A node you can think of as a home computer on steroids. So if your home computer has one processor, which probably has multiple cores, one of our nodes has two processors, with all of those cores, and then we take those and we link them together with a very high-speed network, and it allows each of those nodes to communicate with each other. Then we take those nodes within the rack and we hook them up with other racks, and we keep doing that over and over and over.

Host: Over and over.

Bill Thigpen: When you think of the Pleiades supercomputer, it’s an 11-dimension, dual-plane hypercube, which means that the connections that come out if you look at a single rack, you have cables running within that rack to connect the systems, and within that one rack, you have three dimensions. Then, when you add the rack next to it, you have the fourth dimension. When you add the two racks next to that, you have your fifth dimension. Then you add your four racks next to that, you have your sixth dimension. You add eight racks next to that, you have your seventh. Then you start duplicating rows.

Host: Just going further and further.

Bill Thigpen: What that means is that, when you’re dealing with these really complex problems and you’ve sort of divided the problem among a bunch of these processors, to communicate, they never have to go more than 11 hops to talk from anywhere in the machine to anywhere else in the machine. By having that, the very high-speed networks, it doesn’t really slow them down, and so they’re able to communicate well. So if you want to know what a supercomputer the difference, it really is the speed in which they can communicate with each other. Then there are some other kind of difficult problems that get stacked on top of that. Things like the file system. If you think about your disk in your home computer, your disk in your home computer, you have you using it. Nobody else is using it at the same time.

But if you looked at our computer, we have hundreds of users all accessing the same file system or all accessing the same disk all over that network that we were talking about. So the ability of the software to deal with all those requests coming in and out becomes very complex.

Host: Because it’s not like one person is using the Pleiades supercomputer at one time. You’ve got multiple people trying to get something out of it.

Bill Thigpen: Right. You can go online. Anyone who’s listening to this could go online right now and look at what our computer is doing. That website is public. It shows how many jobs are running. It shows how busy the system is. It shows how much of it’s available to use. So people can see it. Generally, what we have is, we’ll have hundreds of jobs running at any given time. The majority of our time is spent with jobs running about 8000 core level. That’s multiple nodes. That’s hundreds of nodes.

Host: I was going to say, yeah, that’s like hundreds of personal, I7 beefed-up computers.

Bill Thigpen: Yep. That’s exactly it, and we’re using Intel processors, so it’s Intel Inside type thing.

Host: Yeah.

Bill Thigpen: They’re a sort of commodity, but it’s a very different environment.

Host: Is there a limit to how many dimensions and racks you can add to it, or is there a certain point of maybe diminishing returns, where you can keep adding more, but you’re not getting quite enough out of it, or is this just like exponential, and you can get bigger and bigger and bigger?

Bill Thigpen: Your limiting factor is money.

Host: As always. [Laughs]

Bill Thigpen: I mean, on a number of levels. One of the levels is the power that you draw.

Host: Okay. Getting really hot.

Bill Thigpen: Well, it’s not just that it gets really hot. It gets really expensive. Pleiades draws four megawatts of power.

Host: Oh, wow. And compare that

Bill Thigpen: Your house draws one kilowatt.

Host: Okay. [Laughs] All right. You got a neighborhood, a small town.

Bill Thigpen: Yeah. You have a town.

Host: Okay. Yeah.

Bill Thigpen: And that costs money. We have users today that, in order for them to really solve the problem that they’re trying to solve, they would need a yottaflop computer.

Host: Yottaflop?

Bill Thigpen: A yottaflop. And no one knows what a yottaflop is, right?

Host: Yeah, no idea.

Bill Thigpen: No one has an idea.

Host: You could be making it up. I wouldn’t know.

Bill Thigpen: Right. It’s hard enough thinking about the scales that we talk about, a petaflop. Years ago, people didn’t know what a “peta” was. They didn’t know what a “tera” was. Most people know what a “giga” is.

Host: Yes. We got that.

Bill Thigpen: A “giga” is a billion, and when you have a thousand of those, you have a “tera,” and when you have a thousand of those, you have a “peta.”

Host: Okay. Like it gets bigger.

Bill Thigpen: Okay. Right. Then, when you have a thousand of those, you have an “exa,” and you have a thousand of those, you have a “zetta,” and when you have a thousand of those, you have a “yotta.”

Host: All righty.

Bill Thigpen: Okay? So that’s the scale that we’re…

Host: You’ve already gone way past my conceptual mind.

Bill Thigpen: Right. That’s the scale that we have users who need today. And you go, “Well, why would anyone need something that big?” It’s really simple. They’re trying to model very complex things. In this one case, the user is trying to model the magnetosphere. That’s the sort of protective shell that surrounds the Earth.

Host: That falls into the obvious follow up question. When you think of NASA, rocket launches, training astronauts, exploring the solar system and beyond, it kind of comes into NASA has a supercomputer. So how does NASA use this supercomputer?

Bill Thigpen: Oh, wow.


Host: It permeates everything. Every section, whether it’s aeronautics and it’s exploration, it always seemed to have the supercomputer involved.

Bill Thigpen: One of my titles is the deputy project manager for the High-End Computing Capability project.

Host: Okay.

Bill Thigpen: Its role is to provide this tool to NASA. NASA in 2004 realized that high-end computing was an essential tool for it to do its job, and it wasn’t just an essential tool for aeronautics or for designing spacecraft. It was needed across the board.

We support scientists doing astrophysics, so they’re looking at the origins of the universe. They’re looking at what happened when universes collide. They look at what happened when black holes collide. There are scientists that are looking from the Big Bang all the way to the present. They’re trying to understand the fundamental workings of everything.

Host: Of everything. Of literally everything.

Bill Thigpen: Right. Then we have planetary scientists. You’ve heard about the Kepler mission and discovering new planets. They’re heavy users of our computer. They’re sitting there, looking at these results coming from space, and they’re running through thousands of different results, looking to see this tiny blip that is this planet moving in front of the star. They have the science and theory to be able to tell you what that planet looks like and what temperature that planet is, whether or not man could live on that planet.

We have heliophysics. It turns out that we have a star that’s pretty close to us. It’s really important to look at, because we depend on that thing, right?

Host: Yes.

Bill Thigpen: So we have scientists that look at that. They run simulations trying to understand better what’s happening in our star.

Then we have earth science, which is a very important part of what NASA does. I always used to explain that we had satellites that point out and we have satellites that point down. The satellites that point down are gathering data that are helping us understand better what’s happening on this planet.

With that data, policymakers can make decisions to change laws on how you deal with fuel consumption. They can look at what’s happening to the forest, et cetera. But also, scientists are looking at things like earthquakes, modeling hurricanes.

Host: Erosion.

Bill Thigpen: We have scientists that are working on things like helping farmers better utilize their land, and all of that is done on these systems. So that’s sort of the science mission side.

The human explorations and operations, in that area, we’re looking at designing capsules, designing rockets. It used to be that you would have to build models and take those models into wind tunnels and do tests. Well, with a computer, you can run through thousands of changes in a very rapid manner. You can really save money and the cost to get from idea to when you bend metal, when you actually build the rocket. But there are also things that you can test in a computer that you can’t test anywhere.

Host: You can perfect it in the computer, get it refined and then eventually, towards the end, when you think you really got it straight, then go to a wind tunnel and see if it’s the same.

Bill Thigpen: Right. But it turns out now there are planes that are being built that never go to the wind tunnel.

Host: Oh, wow.

Bill Thigpen: The accuracy of the models has improved so much by these advances in supercomputing that it’s no longer required on some designs to even go to a wind tunnel, but it is a requirement to have a supercomputer to build an airplane. You look at the work on rockets. When you think about, as the rocket stages are coming off, well, what happens to the rocket if they reconnect? How do you test that? In a computer, you can test it.

Host: Yeah. Like even the temperature, I’m guessing

Bill Thigpen: Right.

Host: … the humidity, minor tweaks, and just paint out all your scenarios.

Bill Thigpen: Some of the things that people don’t really think about sometimes is we even use supercomputers to model the launch environment.

Host: Yeah. I remember seeing a simulation on one of the hyperwalls.

Bill Thigpen: Yep.

Host: It was a sketch of the SLS rocket that NASA’s building now, and it was showing the plumes and the rocket fuel, the liquid, and…

Bill Thigpen: Right.

Host: …all of the stuff taking off, and all those little bits of swirling around have a purpose. It’s not like an artistic rendering of it.

Bill Thigpen: Right.

Host: It’s the supercomputer.

Bill Thigpen: What that supercomputer allowed us to do was to decide whether or not we needed a new mobile launch platform.

Host: Oh, wow. Whether they need to rebuild the stuff at Kennedy.

Bill Thigpen: Right. At $2 billion cost.

Host: Yes.

Bill Thigpen: So you ask why we have supercomputers. We have supercomputers to save the taxpayer money. The next thing that we’re really looking at is in aeronautics. It’s real funny. A lot of people don’t realize NASA does aeronautics.

Host: Yeah. I mean, it’s literally in the name.

Bill Thigpen: Right.

Host: NASA.

Bill Thigpen: It’s part of the name. This center has done aeronautics for a long time.

Host: Since before NASA. Right.

Bill Thigpen: Yeah. When it was NACA. But supercomputing, its really first responsibility was for aeronautics. It was for designing airplanes. It was for making improvements on aircraft.

At NASA, what we’re doing is we’re really looking at ways of making aircraft more efficient. We’re trying to make them quieter. We’re trying to make them get to cruise altitude faster, which saves the taxpayer or the people who are using airplanes a lot of money, and we look at really complex problems. We look at things like rotorcraft. If you think about how that model looks, it’s a very complex model. What do we do with supercomputing? Pretty much everything across the board.

Host: Well, it’s funny, because if you think of NASA, whether it’s telescopes or even wind tunnels or all the instrumentation that NASA does, it’s all about data collection. Data collection. We have sort of Excel sheets full and full of data, full of all the data you could think of, but at the end of the day, the magic happens or the science happens of when you take that data and turn it into knowledge.

Bill Thigpen: Right.

Host: A big part about processing all of that data, it’s like you have a supercomputer on your side. That can help make those connections, I guess.

Bill Thigpen: Yeah. It’s very interesting, because a lot of times, when you’re in supercomputers, what you really talk about is how many flops you have or how big a computer you have or how fast a computer you have. Our goal is not to have a fast computer. Our goal is to really enhance science and engineering. It’s a tool. It’s a tool to make the really cool things that NASA’s chartered to do possible.

Host: Yes. Starting back in 1999 and seeing where we are now, but then looking to the future 5, 10 years, what is the future for supercomputing? I know we’ve talked about quantum computing and that leap. What’s your job going to look like coming up?

Bill Thigpen: I’m thrilled with what we’re doing right now. For probably the last 10 years, I’ve really struggled with how to provide NASA with more computing, given the constraints that we have. One of the constraints that we have is the building that we’re in. So we’re moving from traditional computing within a building to a new concept of doing supercomputing in modules. A lot of people said, “Well, why would you do that?”

Well, earlier I said that supercomputers use a lot of power. Well, that power sort of equates to heat. So we have to get that heat out of the facility. Four megawatts for the computer? About one and a half megawatts to cool the computer.

Host: Wow. And it’s very expensive to do all this heating and cooling.

Bill Thigpen: Right. It’s probably closer to one megawatt, but it’s expensive. Basically, for every dollar we spend on computing, we’re spending 26 cents to cool that computer. That’s sort of where we are in our supercomputing facility. So I looked at this new way of deploying computers. By using this module, instead of going through the very complex set of systems that we have to go through to cool the computer inside, which means we have to go through a cooling tower, we have to pump the water from the cooling tower to chillers, we have to pump a different set of water from the chillers up to the computers, we have to get it back down, all of that activity is where you get that 26 cents.

So I’ve changed that to putting these computers in modules outside and using the air that’s free. Now, instead of spending 26 cents, I’m spending less than 3.

Host: Awesome.

Bill Thigpen: The other side of that is, instead of evaporating a lot of water, I’m barely using any water at all. So we’re able to provide the computing for less energy, using less water. We deployed a prototype that really went operational at the end of last year. Because of the success of that system, we’re looking at deploying a much larger system that will allow us to meet a lot of those needs that are needed.

On one hand, I kind of see this growth of capability based on technology that we have today and the growth that we see there, but I see other types of advanced computing coming in. You mentioned quantum computing. We have a quantum computer at the NAS. It’s a very experimental machine. It’s looking at a different type of computing. It’s better at modeling things that are uncertain. As a digital person, I kind of think of everything as ones and zeroes. It’s either on or off.

Host: Yes. It’s on or off. Yes. There’s electricity going through it, or there isn’t.

Bill Thigpen: Right. If it’s on, every time I look at it, it’ll be on, unless I go and turn it off.

Host: [Laughs] Not with those pesky quantum computers.

Bill Thigpen: Right. Quantum computers are better at modeling life. Think about it. If I came into your office this morning and I said, “Are you going to go to dinner tonight?” you would have an answer, and it would be a true answer, and it would either be yes or no. But if I came in now or after lunch and asked you the same question, it could change. That’s what life is really like. A digital system isn’t necessarily good at modeling that. A quantum system is much better at modeling it, but we’re in the very early stages. It used to be, on digital computers, you would run a job three times, and if you got the same answer twice, you’d call it good.

Host: Wow. [Laughs]

Bill Thigpen: We eventually got to a point where, every time you ran the problem, it would be the same answer. Well, quantum is still in that really early stage, where they’re trying to see if what they think is going on really is going on.

Host: Yeah. I heard somebody describe it one time of the Wright Brothers. This is the proof of concept…

Bill Thigpen: Right.

Host: …but if you’re thinking landing somebody on the Moon as the personal home quantum computer, we’re a long ways off from moving that proof of concept into something.

Bill Thigpen: I think what you’re going to see, though, and what you’re already seeing is that a lot more people are moving into supercomputers. They’re really becoming an essential part of business. They’re an essential part of corporate capability or being able to be competitive at all.

It was real funny, because I got to see this stuff happening. It used to be that no airplane company did supercomputing. Now they all do. It used to be that racecar teams did not use supercomputing. Now they all do. I saw a presentation once from the people from Procter & Gamble. They actually modeled the Pringle potato chip…

Host: Oh, that’s amazing.

Bill Thigpen: … on supercomputers, because they go through the factory so fast. The reason it’s shaped the way it’s shaped is so it won’t fly off the assembly line. They have that same problem with disposable diapers.

Host: Okay. It’s just going through that factory, that machine

Bill Thigpen: It’s going through so fast. So they have to model those processes.

Host: The aerodynamics, to make for sure it doesn’t go flying off the assembly line.

Bill Thigpen: I’ve been lucky enough to go down to Mexico several times to talk to them about supercomputing, and I’ve talked to the government in Jalisco, and I talked to them about how important supercomputing is from a country point of view, because it allows you to make decisions and sort of chart your own path. As a country, you could decide to spend money on supercomputers or you could let other countries spend money on supercomputers and tell you what to do.

Host: Yes. For folks who are looking for more information on some of the stuff that you’re working on, I know, on nasa.gov/ames, we have a landing page for supercomputing. We’ll add those into the show notes, so anybody can go ahead and check those out.

Bill Thigpen: Okay. Perfect.

Host: We’re online, so on Twitter at @NASAAmes, and we are using the hashtag #NASASiliconValley, so if anybody has questions for Bill about the wonderful world of supercomputing, go ahead and tweet us some responses, and we’ll hook you up with Bill to get some answers for you.

Bill Thigpen: I’d be really pleased to answer them.

Host: Thank you so much for coming over. This has been fascinating.

Bill Thigpen: Thank you.

Bill Thigpen manages staff and activities for key system development efforts—most recently, expanding the Pleiades supercomputer, which ranks among the world’s fastest and most capable systems; and leading the effort to procure and deploy a modular high-performance computing center to significantly reduce the water and power required to cool today’s supercomputers. As Deputy Project Manager for NASA’s High-End Computing Capability Project, Thigpen also leads the team that provides day-to-day operations to ensure delivery of world-class supercomputing resources and services supporting NASA missions.

In addition, Thigpen oversees the deployment and testing of ground-breaking technologies. Currently, efforts include working with the NASA community to migrate to many-core accelerators from both NVIDA and Intel, and evaluating the potential benefits of quantum computing on NASA’s mission challenges.

Thigpen’s 34-year career supporting government contracts includes software engineering and mission support management. Before joining NAS in 1999, he worked for Sterling Software, Inc. at the NAS facility, leading the advanced systems development group, and managing technical teams that provided HEC capabilities and services to NASA researchers. He holds a bachelor’s degree in computer science from the University of Nebraska.

Download the MP3Subscribe to the NASA Podcast

Sign up for our insideHPC Newsletter