An Interview with AMD’s Vlad Rozanovich

Print Friendly, PDF & Email

Click Here for the Audio Interview with Vald Rozanovich

AMD is the king of leapfrog and a driving force in moving HPC forward. They are often dwarfed in the publicity circles, frequently being drowned out by Intel chatter, but the company needs to be heard.

Their Interlagos processor has once again raised the bar and will be at the heart of the new Titan system being built by Cray.

We have no doubt AMD will play a key role in driving us forward on the journey to exascale.

Just in time for the Fall pumpkin season, as we are all preparing for SC11, we’re pleased to bring you this interview with AMD’s Director of North America Commercial Business, Vlad Rozanovich.

The Exascale Report: So Vlad, most people seem to think that once you ship the processors to Cray, AMD is then finished with your portion of this massive effort. But we know that is far from the case. Can you talk with us about how AMD has been working – and continues to work – with your partners such as CRAY, ORNL, NVIDIA, The Portland Group and others, to ensure the best possible and most productive user environment when bringing such a monumental system online?

AMD: Mike, that’s a great question. One of the things I’ll also reference is that it doesn’t just happen after the sale. I would say that a lot of the work AMD does in relation to the entities you mentioned, Cray, Oak Ridge, and the compiler teams we work with, a lot of the work, especially around Interlagos, was done very early in advance. When AMD is looking at how we want to bring a product to market, the feedback we get from important end-user customers like Oak Ridge, and the feedback that we get from important OEMs like Cray, even goes down to the feature sets of what we need to create within the processor itself. And within those feature sets you have a lot of instruction sets that actually would benefit deployments such as this one for Titan. And that’s really where a lot of this starts and it continues on. So one of the things that I look back at is – the thing that AMD has done within the HPC environment is to really focus on deployments like Titan, making sure that we’re working with the compiler organizations like PGI and in other cases GCC, Open64, making sure that their releases of their compilers are taking advantage of the hooks that our Interlagos product was bringing to the market. So, in the case of PGI making sure that – I think it’s version 11.9 – is taking full advantage of the hooks that we’ve built in for Interlagos. And some of those hooks actually go to the instruction set.

Now to go into your question of what do we do after the fact, the beautiful thing about a design like the Titan cluster is that Oak Ridge has decided they are going to take what really is the best of all worlds. There are very intelligent people at Oak Ridge, and for the professors and the researchers working on those clusters, there’s a continuous training that has to happen. Obviously one of the biggest things is how to program in parallel, and so, from an AMD perspective, one of things we want to make sure of is that all of those involved really understand what are some of those instruction sets like FMA4 and xDOT that AMD is actually bringing to the market that really benefit high performance computing codes. How they can actually start programming around it. How that actually then starts working with GPGPU technology. So in the end, once you put all those things together, what you get is an instruction set perspective, a compiler perspective – using those instruction sets and taking advantage of the Cray system – the interconnect architecture it brings to introduce that level of parallelism – and then eventually teaching those researchers how to use all that. That’s really what has to happen to get a system like this up and running and fully capable.

TER: So there is an ongoing training and education portion to this project?

AMD: Absolutely Mike. And like I said, that training and education started, from AMD’s perspective, awhile ago making sure we’re dealing with the OS organizations and the compiler organizations, making sure they are putting those hooks in, and more importantly, making sure that through our developer outreach on developer.amd.com, the support that we offer through our developer programs, that researchers can take advantage of those hooks, and in the case of Titan making sure that when the researchers are looking at how to parallelize some of the application sets they are doing whether it’s on climate modeling or materials science that they actually know some of these advantages that we have built into the Interlagos processor that they can use.

TER: What extra effort is required because of the scale of this system that wouldn’t be required on typical large cluster systems?

AMD: So when we look at the size of what this cluster will eventually be, going from 10 to 20 petaFLOPS as a target, obviously you are talking about a system that eventually could end up at the top of the Top 500 list. So, from an AMD perspective, we’re very excited to be incorporated into such a system. We know there is a lot of work that now has to happen to get that system up and running. I think the support that AMD will offer Cray and the support that we offer through our developer outreach, and the support we offer from a training perspective on how to use the instruction sets, when you look at a thing like FMA4 and the hooks that it has in it for vector and matrix multiplication specifically around chemistry, physics, quantum mechanics codes, those are going to be advantages that researchers can use to speed up code, especially as they look to say, ok, the processors are really going to be kind of the service processors of the system, for example how they interact with the GPGPUs, what are some of the codes that could actually transition between those two devices? That’s going to be an important part of this entire project that’s ongoing for Titan.

TER: Clearly one of the challenges that we’ve faced in HPC over the past two decades has been trying to get productivity out of these incredible platforms. It sounds like there is a lot of effort on your side – and among the partners – to ensure that this isn’t the case with Titan.

AMD: I agree. And I can’t emphasize enough how this is truly a partner effort. If you didn’t have all the components working together, when you are dealing with something so complex, you really need to understand all the components. The history that AMD has had with Oak Ridge, the history we’ve had previously with the Jaguar deployment, and now into this Titan deployment where they’ve done that upgrade from the Istanbul up to our newest Interlagos product, the history that they have with the AMD core architecture goes back a long time and so the things that we want to make sure of and the things AMD wants to make sure we contribute to this effort is – not only is it the software efforts that we’ll be contributing. but it’s the integration efforts that we are obviously doing with Cray.

Cray is really – when you look at what an OEM or how real talented people can bring all these components together, making sure they have a best-in-class CPU, a best-in-class interconnect, a best in class memory and IO infrastructure, and integrating that with a GPGPU environment. This is, like you said, probably going to be one of the most complex systems ever created but it also has the most potential. One of the things that I’m thrilled to see, because of that relationship with Oak Ridge that AMD has had over the years and the commitment we’ve made into high performance computing, we were thrilled that they chose AMD for this refresh – for this upgrade – this new complex deployment, and I think it has a lot to do with the feedback they’ve given us over the years such as, “OK, here’s how you can improve the CPU.” When they deal with individuals at AMD like Chuck Moore who is our Senior Fellow and Architect of the Bulldozer architecture – for those organizations to give our engineers real good feedback and what they want to see – those are the things that we want to bring to market, and obviously, like you said, it’s not the things that we’ve brought to market but how does the researcher actually take advantage of those components.

TER: So Vlad, putting a system together of this size, with this many processors, operational problems come with the territory. What are you doing with the partners to address resiliency?

AMD: When I look at resiliency – uptime – how do we allow for a large system like this, not only to be fully operational, but to bring years of revenue generation into Oak Ridge, obviously from AMD’s perspective, we’re going to support it fully – on a bring-up perspective on a programming perspective, the integration that I mentioned earlier with Cray where if you look at the OEM knowledge that Cray has of AMD architecture – that’s another thing that we’re going to be relying on pretty heavily. So, the question of what is AMD focusing in on and how do we make this the highest uptime – the highest reliable – there’s no question in my mind from a software support perspective and a hardware technical support perspective – as needed in advance – one of the things we’re going to make sure of – we’re working with those organizations to ensure that – as there are any questions as there is any potential for improvement, that there is a line of communication back to AMD that understands ok – here are some of the things we could be doing better .. here are some of the flag settings that you can set within the code to really take advantage of some of the Bulldozer architecture. Those are the things that we want to make sure of as the bring-up of this system goes along because we know it’s complex. We’ve seen a lot of clusters in the past and sometimes when there’s a deploy it may take a year before they become fully operational, and in some cases, as we’ve seen over the last decade, in many times the system is brought up and is promising a certain performance level – you’re always going to hit some snags – some of those snags might be interconnect related – some might be performance related and so from my perspective on this the great thing is that we do have multiple organizations that really understand what’s trying to be created here and all the support that AMD can do from a developer perspective – from a technical resource perspective – from an understanding of the architecture perspective, we’re going to provide that to both Cray and Oak Ridge.

TER: So with the recent situation that took place with IBM, NCSA, and Blue Waters, IBM has understandably taken a lot of heat from the community and their commitment to HPC has been challenged. And, at about the same time, Intel has stepped up rather strongly to claim a position of exascale leadership over the rest of this decade. Share with us your thoughts on AMD in terms of a position of exascale leadership and the journey to exascale. What role will AMD play?

AMD: So when we look at exascale, Intel has taken a very visible role putting in place people, in Capitol Hill, really trying to do an influence on the Federal government of how they are committed to HPC. The one thing Mike, when I look at what AMD has done over the years, especially – not only on exascale but also in promoting HPC in general, it’s something that’s always been core to AMD’s success once we entered the server market with Opteron back in 2004- 2005. When we look at how we introduce our products, how we want to drive supercomputing moving forward, especially from an exascale perspective, it’s something that AMD absolutely considers and we want to make sure we are relevant in all those discussions. So as we gather feedback from the national labs, from the government institutions, from private industry that uses some of those deployments, but most importantly from a lot of the universities that are really the sponsors and the beneficiaries of some of these clusters that go into place, there’s no question from an AMD perspective that when we think about how we want to approach exascale there are many different paths and if you look at GPGPU computing and the way that’s now starting to become an interest among so many people it’s really that programming environment.

The history that AMD has had, from delivering first to the marketplace – and a lot of it has benefited HPC, it’s the first 64-bit, it’s the first Dual-core, it’s introducing APU in the sense we first introduced APU for the client market to make we tested it out in laptop and desktop deployments. It’s those things that AMD is really looking at. We’re going to contribute to exascale. We’re going to contribute to those people who are really influencing where exascale is going. We’ve been a backbone of HPC computing for a long time and it’s certainly not something we’re going to exit today. One of the things that I’ve seen is the work that’s done at the Federal level with programs like exascale – they are really the game changers that lead into industry and whether you look at climate modeling or look at physics or chemistry codes or mathematical modeling on a Monte Carlo financial simulation, those are all the things. When you look at large deployments like exascale it really adds into more what I’ll say private sector, corporate enterprise advantages in code deployments that happen over the next few years. So from AMD’s perspective, we want to make sure we’re still talking to all those relevant parties from an exascale perspective. We want to make sure that we’re listening to what they want to see out of us in the future whether it’s 2018 or beyond. The feedback that we get today is critical to make sure that we have the roadmap that fits what HPC really needs in that timeframe.

TER: Vlad, thank you for a very good interview.

AMD: Mike, I appreciate it. Thanks for the time and the opportunity. The last thing I have to say is, with HPC and exascale, this is an exciting time for us. There’s a path in the industry right now where, as we look at the big problems in the world, these are the things (Titan) that are going to be created to solve those, and from AMD’s perspective, we’re happy to contribute and it also excites me to be involved in this.

For related stories, visit The Exascale Report Archives.