An Interview With Steve Scott, CTO, Tesla Business Unit, NVIDIA

Print Friendly, PDF & Email

According to Steve Scott, recently appointed CTO of the Tesla Business Unit at NVIDIA, “Oak Ridge’s decision to base Titan on Tesla GPUs underscores the growing belief that GPU-based heterogeneous computing is the best approach to reach exascale computing levels within the next decade.”

In the first phase of the Titan deployment, which is currently underway, Oak Ridge will upgrade its existing Jaguar supercomputer with 960 Tesla M2090 GPUs, based on the NVIDIA® “Fermi” architecture. These GPUs will serve as companion processors to the AMD multi-core CPUs. In the second phase, expected to begin in 2012, Oak Ridge plans to deploy up to 18,000 Tesla GPUs based on the next-generation architecture code-named “Kepler.”

We spoke with Steve Scott to discuss the role of the GPU in this important milestone system and what Titan will mean to the HPC user community. Here is that interview.

The Exascale Report: NVIDIA is a leader in the mobile and desktop chip market, and now you are solidifying a position as a leader in the supercomputing market and have both ends of the market covered quite nicely. I also see that NVIDIA’s stock was up 4.4% as of yesterday, and that’s a 44% increase above the 52-week low. The company is in a very strong position right now of exuding confidence. It seems to me that NVIDIA is doing a tremendous job of figuring out a business model that can survive, or maybe we should say surpass, the volatile nature of the HPC market segment. Would you care to comment on this?

SCOTT: Well, the business model is key. One of the things that attracted me to NVIDIA is that I believe NVIDIA has the right technology to get to exaFLOPS computing, and drive High Performance Computing in general, because of its unique approach to heterogeneous computing that results in greater energy efficiency. But also because of NVIDIA’s business model that allows us to support that development. It costs on the order of a billion dollars to create each new generation of high-performance processor, and the HPC market simply isn’t big enough to support that level of development. The fact that we have a very high volume market for our graphics processors, and that we can use the same chips for high-end graphics workloads as we can for HPC, allows us to get the best of each new generation. We simply couldn’t do what we are doing if we weren’t leveraging that large consumer-driven market.

TER: So it’s a matter of striking a pretty good balance in terms of your future R&D investment as well – and maintaining both sides of the equation?

SCOTT: Exactly.

TER: So Steve, you’ve seen a tremendous amount of new technology during your career, and I imagine it’s difficult for you to get really excited about new developments when they come along – but you do seem to be very enthusiastic about what’s going on here with Titan. What do you find most exciting – or most interesting, about this important milestone on the journey to exascale?

SCOTT: Well, Oak Ridge National Labs is really the world’s premier open computing facility. They have a tremendous track record of delivering large-scale systems with their partnership with Cray – and delivering science out of those systems, so I’m particularly excited to have this caliber of organization look to the future and make an ‘all in’ bet on GPU computing. I think they’ll do tremendous things with this system, and I’m looking forward to the many teams of scientists they will have generating results and moving their codes forward to what I think will prove to be a hybrid multicore future.

TER: And you’ve been involved with Titan for what…maybe 3-4 years now?

SCOTT: I’ve been involved with Oak Ridge through my position at Cray for, well, it must be 8 or 9 years now.

TER: So specific to this new system, what are some of the tradeoffs that had to be considered – or perhaps still need to be considered when it comes to building such a complicated hardware system, yet still creating a workable user environment?

SCOTT: Cray is really expert at building large, useable systems from their custom interconnect to their software stack, but the NVIDIA component is really about how we’re going to move the compute nodes forward. And, we’re in a new era now where we are completely constrained by power. So, power efficiency equals performance. NVIDIA’s architecture is explicitly heterogeneous with the GPUs being designed from the ground up to be energy efficient when running parallel workloads. So we can really concentrate on the compute node itself, the memory system, and the functional units and memory hierarchy that will allow us to deliver more performance at much lower power consumption than a traditional CPU architecture. And, the scalable aspects of the system are being addressed by the Cray interconnect and software stack.

TER: And obviously, as we’ve been publishing The Exascale Report for some time now, pretty much everyone has hit on power as the number one obstacle that must be overcome in order for us to actually ever get to exascale, which says to me that NVIDIA has a long, healthy road in terms of this journey over the rest of this decade.

SCOTT: Yes. It’s hard to overstate the importance of the shift that has happened. Somewhere on the road from teraFLOPS to petaFLOPS, we went through a fundamental transformation in the way that computing technology scales. For the past several decades we’ve been able to drop the voltage in proportion to the feature size of our transistors, so every time we halve the size of the transistors, we also halve the voltage, and we’ve had this wonderful, magical ride of exponential increases in compute performance with constant power per processor – and that trend has simply ended. We can no longer drop the voltage with each new generation in proportion to the feature size, and so we’ve become power constrained. If you put all the transistors you can now fit on a chip and run them at full speed, the chip would literally burn up. So, power efficiency has become the determinant of computing performance – and this will only continue to grow in importance for the foreseeable future. Every new generation will become exponentially more power constrained, so it’s all about reducing the overhead and making sure that a higher fraction of your watts are being used for actual computation – the adds and multiplies, shifts and compares that do the real work. And so there’s a tremendous advantage with the explicitly heterogeneous architecture that is represented by GPUs – and I think it really does point the way to a fundamental shift in the way we do high performance computing. This is going to be very good for NVIDIA over the next decade.

TER: So Steve – as CTO of the Tesla Business Unit, how close are you to what the user environment needs to evolve into as we get closer to these 20 petaflop – class systems? And what changes do you see for the users as they transition from Jaguar to Titan?

SCOTT: Well, users are going to increasingly have to worry about concurrency and parallelism since all future performance increases will come from parallelism – not from clock speed and not from complicated processors. They have to start thinking about exposing concurrency – exposing available parallelism in their applications.

Workloads will be moving to systems that are on the order of 100 million processor threads of computation to get to an exaflop, and so programmers have to be thinking about big problems and they have to be thinking about the parallelism at the distributed memory level, at the irregular multi-threaded or multicore level, and at the vector SIMD level in their codes simultaneously. So this is going to be a major challenge for users. Also, users are going to have to start paying more attention to resiliency sometime before we get to exaFLOPS .

We’ll try to build systems that are as reliable as possible, but a system of that scale is not going to remain up long enough to run a large computation over a period of days or weeks without having some failures, and checkpoint/restart is not going to scale very effectively, so we’re going to have to move toward having resilient applications themselves. And, this is going to creep in – in a not yet understood way to the users –so there will be some impact to the applications.

TER: So there has been some criticism – fair or accurate – or not – in the past that the GPU environment is a difficult one for people to get a handle on, the programming is difficult and so forth. What steps are you putting in place at NVIDIA to improve the programmability or the user environment from that respect?

SCOTT: Great question. There’s actually been a tremendous amount of innovation over the past several years in the CUDA software environment. CUDA originally made GPUs programmable with a high level language as opposed to trying to pretend everything was a bunch of triangles you were trying to paint on the screen. And, as we’ve gone from CUDA 1 to CUDA 4, the number of software features, libraries and tools that allow you to do debugging and performance tuning have grown in both number and sophistication, and have become easier to use.

Right now, one of the things that we are quite interested in is using directives. We think there is a lot of potential for broader adoption of GPU computing by using directives that are very similar to those you would use for multicore CPUs. These directives will allow you to easily expose the parallelism and have the compiler and runtime then map that parallelism on to the underlying hardware.

In fact, what we found in the first year or so of looking at directives for GPUs is that when people take the effort to tune their applications to work well with directives, it not only makes the code run faster on GPUs, but it also makes the code run faster on CPUs – because you’re really making the code better by exposing more parallelism.

So I think that a combination of a more robust and easier to use CUDA programming environment, as well as new support for directives, allows you to have full portability and get at the performance of GPUs simply by putting directives or annotations into your existing code. And, it will more or less eliminate the barriers, real or perceived, of using GPUs from a programmability and portability perspective.

TER: What steps will NVIDIA be putting in place to help ensure we don’t see another situation like we just did with IBM, NCSA, and Blue Waters?

SCOTT: Well that might be more of a question for Cray than for NVIDIA. I was quite surprised that IBM decided to pull the plug on that program. It calls into question the commitment that large companies can make when they are doing something that isn’t core to their company. At NVIDIA, high performance computing is core to our company. Massively parallel processors are what we do. They are in our wheel house, and there is no chance that we’re going to decide that we don’t want to continue to push performance per watt and parallelism. So, when you have a large company doing something that is not directly related to their core market, I think you may have to worry. But, at NVIDIA, building these sorts of processors is our core focus, despite the fact that a majority of those processors go into systems where they are running graphics – they are actually the same processors that run fluid dynamics or structural analysis or climate models.

TER: How do you keep all the partners working in harmony moving forward? Are there any milestones or planned audits to ensure that the collaborating partners are keeping in synch?

SCOTT: Well I don’t think it’s a matter of establishing specific milestones in delivering the Titan system. We have a daily / weekly ongoing interaction with Cray and Oak Ridge, porting the applications to be ready for Titan when it powers up, and we are also working on a variety of issues having to do with the system software and the interaction with the hardware and other aspects of integration into the system. So Cray and NVIDIA have a very tight relationship – we have a tight relationship with our other OEM partners as well – and there is just no way you can build a system of this scale without that. I think the 3-way partnership with Oak Ridge, NVIDIA and Cray has been fantastic. I was involved in it quite heavily in my capacity at Cray, and continue to see that relationship really work effectively now that I work for NVIDIA.

TER: So Steve, give us a picture of what you see for next year – as we are getting ready for Supercomputing 2012 – and what will NVIDIA’s role be in the global HPC market at that point?

SCOTT: Well I can’t comment specifically on the schedule for upgrading the Titan system with next generation Tesla GPUs – they will be doing that during the second half of next year and I imagine there will be a lot of excitement and activity around the Titan system as we prepare for Supercomputing 2012. I expect that this system is a signpost of where we will see momentum shifting in the high-end HPC market, and actually in the broader HPC market.

If you would have looked at the Top 500 list three years ago – the June of 2008 list – you would have seen no GPU-based supercomputers on that list. Now we’re in a situation where 3 out of the top 5 systems are powered by NVIDIA Tesla GPUs, and the number 3 system, Jaguar, is about to be upgraded to use NVIDIA GPUs. We’ve seen a tremendous amount of uptake in just a very short period of time. And because this fundamental shift that has taken place in terms of power, I think we’re going to see continued uptake over the next several years.

We’re just excited to see Oak Ridge embracing this technology – and we’re excited to see what the world is going to do with this technology. It’s very gratifying to have customers that are doing the kinds of work that Oak Ridge and others are doing with high performance computing.

For related stories, visit The Exascale Report Archives.