An Interview With Steve Scott, CTO Tesla Business Unit, NVIDIA

Print Friendly, PDF & Email

This interview appears courtesy of The Exascale Report.

According to Steve Scott, recently appointed CTO of the Tesla Business Unit at NVIDIA, “Oak Ridge’s decision to base Titan on Tesla GPUs underscores the growing belief that GPU-based heterogeneous computing is the best approach to reach exascale computing levels within the next decade.”

In the first phase of the Titan deployment, which is currently underway, Oak Ridge will upgrade its existing Jaguar supercomputer with 960 Tesla M2090 GPUs, based on the NVIDIA® “Fermi” architecture. These GPUs will serve as companion processors to the AMD multi-core CPUs. In the second phase, expected to begin in 2012, Oak Ridge plans to deploy up to 18,000 Tesla GPUs based on the next-generation architecture code-named “Kepler.”

We spoke with Steve Scott to discuss the role of the GPU in this important milestone system and what Titan will mean to the HPC user community. Here is that interview.

The Exascale Report: NVIDIA is a leader in the mobile and desktop chip market, and now you are solidifying a position as a leader in the supercomputing market and have both ends of the market covered quite nicely. I also see that NVIDIA’s stock was up 4.4% as of yesterday, and that’s a 44% increase above the 52-week low. The company is in a very strong position right now of exuding confidence. It seems to me that NVIDIA is doing a tremendous job of figuring out a business model that can survive, or maybe we should say surpass, the volatile nature of the HPC market segment. Would you care to comment on this?

SCOTT: Well, the business model is key. One of the things that attracted me to NVIDIA is that I believe NVIDIA has the right technology to get to exaFLOPS computing, and drive High Performance Computing in general, because of its unique approach to heterogeneous computing that results in greater energy efficiency. But also because of NVIDIA’s business model that allows us to support that development. It costs on the order of a billion dollars to create each new generation of high-performance processor, and the HPC market simply isn’t big enough to support that level of development. The fact that we have a very high volume market for our graphics processors, and that we can use the same chips for high-end graphics workloads as we can for HPC, allows us to get the best of each new generation. We simply couldn’t do what we are doing if we weren’t leveraging that large consumer-driven market.

TER: So it’s a matter of striking a pretty good balance in terms of your future R&D investment as well – and maintaining both sides of the equation?

SCOTT: Exactly.

TER: So Steve, you’ve seen a tremendous amount of new technology during your career, and I imagine it’s difficult for you to get really excited about new developments when they come along – but you do seem to be very enthusiastic about what’s going on here with Titan. What do you find most exciting – or most interesting, about this important milestone on the journey to exascale?

SCOTT: Well, Oak Ridge National Labs is really the world’s premier open computing facility. They have a tremendous track record of delivering large-scale systems with their partnership with Cray – and delivering science out of those systems, so I’m particularly excited to have this caliber of organization look to the future and make an ‘all in’ bet on GPU computing. I think they’ll do tremendous things with this system, and I’m looking forward to the many teams of scientists they will have generating results and moving their codes forward to what I think will prove to be a hybrid multicore future.

TER: And you’ve been involved with Titan for what…maybe 3-4 years now?

SCOTT: I’ve been involved with Oak Ridge through my position at Cray for, well, it must be 8 or 9 years now.

TER: So specific to this new system, what are some of the tradeoffs that had to be considered – or perhaps still need to be considered when it comes to building such a complicated hardware system, yet still creating a workable user environment?

SCOTT: Cray is really expert at building large, useable systems from their custom interconnect to their software stack, but the NVIDIA component is really about how we’re going to move the compute nodes forward. And, we’re in a new era now where we are completely constrained by power. So, power efficiency equals performance. NVIDIA’s architecture is explicitly heterogeneous with the GPUs being designed from the ground up to be energy efficient when running parallel workloads. So we can really concentrate on the compute node itself, the memory system, and the functional units and memory hierarchy that will allow us to deliver more performance at much lower power consumption than a traditional CPU architecture. And, the scalable aspects of the system are being addressed by the Cray interconnect and software stack.

TER: So Steve, give us a picture of what you see for next year – as we are getting ready for Supercomputing 2012 – and what will NVIDIA’s role be in the global HPC market at that point?

SCOTT: Well I can’t comment specifically on the schedule for upgrading the Titan system with next generation Tesla GPUs – they will be doing that during the second half of next year and I imagine there will be a lot of excitement and activity around the Titan system as we prepare for Supercomputing 2012. I expect that this system is a signpost of where we will see momentum shifting in the high-end HPC market, and actually in the broader HPC market.

If you would have looked at the Top 500 list three years ago – the June of 2008 list – you would have seen no GPU-based supercomputers on that list. Now we’re in a situation where 3 out of the top 5 systems are powered by NVIDIA Tesla GPUs, and the number 3 system, Jaguar, is about to be upgraded to use NVIDIA GPUs. We’ve seen a tremendous amount of uptake in just a very short period of time. And because this fundamental shift that has taken place in terms of power, I think we’re going to see continued uptake over the next several years.

We’re just excited to see Oak Ridge embracing this technology – and we’re excited to see what the world is going to do with this technology. It’s very gratifying to have customers that are doing the kinds of work that Oak Ridge and others are doing with high performance computing.

Read the full piece at The Exascale Report.