Is Exascale Development on the Right Path? Cray’s CEO Talks About The Road to Exascale

Print Friendly, PDF & Email

The race to exascale is starting to present some interesting questions that go way beyond technology discussions. At the very core of recent deep thinking, is the discussion of responsibility, accountability and funding. The R&D expense associated with exascale will be staggering. What price are we willing to pay, real dollars and non-monetary? And who’s going to foot the bill? Government agencies? Private industry? Which is a heavier burden, responsibility to taxpayers, or responsibility to shareholders? And even deeper than that discussion: should we even be pursuing exascale at such an aggressive rate in order to hit a pre-determined milestone?

To discuss many of these questions, The Exascale Report sat down with one of our industry’s true thought leaders, a man who knows a lot about shareholder accountability, Pete Ungaro, CEO of Cray.

The Exascale Report: First off, how about a reality check. General opinion has everyone targeting an exascale system by the year 2018. Is this achievable? 

Peter Ungaro: Yes, I believe it is achievable. The big questions are: how expensive will the system be, what will be its power usage and how broadly usable is it. I don’t think that anyone (any company or any institution) can predict every technology trend that will be needed to get us there, but with focus and flexibility, it is achievable. That said, it will be at a higher cost as 2018 is quite an acceleration from where the market would naturally be at that time. At Cray, this is a big part of what we’re spending our R&D efforts on right now. I believe the final date the first real (not stunt) exascale system will hit the floor is going to be more about financial trade-offs and mission drivers than technical ones.

TER: Now for a more philosophical question. Even if the goal is achievable, should we be trying to achieve it? Why not take a gentler slope into the intense innovation that will be required and do it over a longer timeframe, reducing the investment needed. Doing it in 10 years instead of 20 takes more absolute money, and more money per unit time. Mankind is facing other fundamental challenges in all kinds of areas, and  public investment in science is a basically zero sum game. Do we need exascale computing at the expense of a cure for cancer?

Ungaro: I think the answer to this is straightforward — we should invest in those areas where we can get the best payoff. We are seeing what I would call the most exciting overall change in the supercomputing market that I’ve seen over the last 20 years with the entrance of new countries into the supercomputing game. Over the last two decades, the market has mostly revolved around a handful of countries, but today, several new countries such as China and Russia are developing plans for petascale and exascale computing and are basing that investment on the fact that they believe it is critical for their global competitiveness.

So the question we should be asking ourselves is if this incremental investment in exascale computing will pay off. To me, it is relatively arbitrary exactly which metric we pick to measure ourselves (flops, timeframe, peak or sustained). It is more important to decide if we need exascale computing to improve our economy; fuel innovation and the base of jobs that innovation can generate; find new sources of energy and become a greener planet; protect our borders and solve critical problems such as finding cures for cancer and improving people’s quality of life. I believe that supercomputing can play a huge role in accomplishing these things, which is one of the reasons why I have personally stayed in this business.

If I got to chose, my vote is that the extra investment necessary to push for exascale computing as early as possible is incredibly small versus the benefits it can bring, but the bigger question is how will others that are in the position to make this decision decide to vote on this issue.

TER: Cray is a public for-profit company, in business to return value to its shareholders. Getting to exascale will involve a tremendous amount of innovation, which is likely to be very expensive. Is there a market in the next ten years that would justify the investment today? What is the business model for exascale at a company like Cray?

Ungaro: This is the main question that we wrestle with at Cray. Can a company be financially successful and be primarily focused on the supercomputing market? We absolutely believe that this is possible to do, in fact, I believe that we are doing it today. We have to do it in a different way, partnering with others for certain key technologies to build best-of-breed systems, but adding in a lot of our own innovation specifically designed for the high-end marketplace. Exascale, we believe, will let us take this business model to the next level, as the technologies that are developed for exascale (such as power efficiency, scalability and programmability) will be leveraged into the broader, yet still high-end, part of the HPC market and form a business model that a company of our size can work well within. In fact, I believe it is harder for the bigger companies. While they have a lot of resources, they also have a lot more pressure on pleasing a much larger market and a broader set of customer needs. This works well in the “normal” part of the computing market, but when you push toward the highest-end, the market bifurcates and you need a completely different business model. This is why if you compare Cray to other successful HPC companies, you see completely different business models at play. We absolutely believe that we can BOTH be a leader in supercomputing and build a great business — it doesn’t have to be one or the other. In 2005, we restructured our company exactly for this and it has begun to pay off for both our customers and our shareholders.

TER: Cray is a very small company relative to the size of the operating funds that could be required to build an exascale computer. Have you thought about what kind of relationship you’ll need with your exascale customers to enable you to work the mechanics of acquiring parts and financing an exascale computer through delivery and acceptance?

Ungaro: This really goes back to your earlier question about the business model needed. It isn’t like anyone is going to buy a usable exascale computer from a catalog — it is going to take a partnership. A partnership between companies that have some of the core technologies, companies that can pull together those technologies and innovate around them in order to make them work at the exascale level, and customers who are willing to work with companies like Cray to make it all happen. There are many different successful models that have worked to build leading-edge systems. We are doing one with DARPA right now that is more of an R&D partnership that ends when we get to demonstrable prototypes with Cray taking over the productization process. We’ve done this with Sandia where they bought a system that wasn’t yet developed and we jointly worked together to co-design, develop and field the system. Virtually every new system we’ve ever developed at Cray has been done with a partnership between a customer that had a unique need and Cray, as well as a number of other partners.

So we know that there are different models out there that can make it all work. Because this is our only business, we are able to try and work within the constraints of the customer and various partners to find a path that makes good sense for everyone. It is clear that it will require extra funding to accomplish the innovation needed and some early funding to source all of the components needed but this is just part of the high-end business these days. I believe everyone that is truly in this business understands this, and I don’t believe that Cray’s needs are so different from those of any other company that could legitimately attempt to go to exascale first. The funding we need is more about fielding the best system versus as a way to “convince” us that allocating our resources in this manner is the right thing to do. We are already convinced, we just need to jointly work with customers and partners to find a viable path that works for everyone.

TER: There is still a lot of speculation about what exascale machines can/should look like. What are the major elements of Cray’s exascale architecture as you see them today? What is the gem — or the unique strength — that Cray brings to the quest for Exascale?

Ungaro: While we can get into long discussions about what exactly the best exascale systems will look like, one thing that is clear is that they will be at huge scale, almost unthinkable scale. That is why I believe we are going to be a major player in the supercomputing segment for a very long time. I don’t believe that anyone is as good as us at thinking through how all the pieces need to come together in order to build the largest supercomputers on the planet.

Cray is all about scalability. It is what we think about before we even start sketching out ideas on a whiteboard. What architectures will work at scale; what do they need in terms of networks, topologies, packaging, supervisory systems, etc. What software is needed; operating systems, runtimes, programming environments, and file systems, to name a few. What is needed to get applications to scale; compilers, libraries, tools, etc. Not just the technologies that Cray does or is willing to do, but broadly understanding what is out there and what needs to be developed to stitch all of that together into a single system that works at full scale.

Our sole corporate focus on the supercomputing market is what I like to call the magic of Cray. We are extremely good at thinking about all the different things that are needed to stand up systems at the extreme scale. Things that work well in clusters typically fall apart at scale, especially if you want to run a single challenging application across the entire system to achieve a breakthrough (and no, the HPC Linpack benchmark doesn’t count!). Even the way you service systems and interact with customers changes when you field systems at the extreme. We have built our entire company around this difference and that’s the key value we provide to the marketplace. We’re not perfect and there is always lots of room for improvement, but I believe we are extremely good at it.

TER: Building an exascale machine is one thing (if you aren’t concerned with power or build costs, you can build one today), but building one within a reasonable power envelope creates a design challenge that many feel will radically alter the model of computing. Do you see exascale as evolutionary or revolutionary?

Ungaro: I believe it is going to need a bit of both, but I also believe that it is going to need to be more evolutionary than revolutionary as it needs to work with a wealth of existing applications that can’t all be rewritten overnight, or even over 10 years. That said, it is going to have to push on revolutionary technologies in order to hit the right balance between performance, size and power usage. I believe that these new technologies will have a lot broader appeal not only across the HPC market but in many other computing-related markets.

What is a reasonable power envelope is another discussion altogether. The largest datacenters out there today are north of 100 Megawatts, so that probably is about the maximum. The problem with this is that the operating costs at 100 MW are extremely high and so our goal is to drive down the power needs of the first exascale systems to one-third, one-quarter or even one-fifth of that number. The viable power envelope will clearly be a major factor in how quickly the first exascale systems will be fielded.

TER: If you see a revolutionary future in exascale machines, that means new operating system software (and probably runtime environments), new development tools, and new applications to go with the radical new hardware. Can Cray create a total solution? If not, where do the other elements that Cray won’t do come from? Which elements are these? Can any company create an exascale system? Can any one nation?

Ungaro: This one is easy — there is zero chance that any single company, institution or even country, no matter how large, will be able to create an exascale solution that is best-of-breed in all areas. I also believe that if you have to redevelop everything from scratch the system won’t be productive until it is out-of-date as the applications that can leverage it won’t be there in time.

For exascale to have the societal impact I believe it can, it is going to take a very large partnership across a number of companies, institutions and users. Some people call it “co-design” and I like that term. In fact, that is how Cray has been developing systems for many years. We have often done it in partnership with our customers and their users and have opened up our development efforts to a wide set of outside people who bring new ideas and concepts to the table for us to pull it all together. Our efforts with Sandia on Red Storm/XT3, Oak Ridge on GPU-accelerated supercomputers and DARPA on Cascade are recent examples of our history here. There is a very vibrant global community of innovative companies and thought leaders that we are already working with for exascale, and more will be leveraged over time.

Without a doubt, I believe that Cray will be a major systems provider in the exascale era, but we will do it with a large set of partners to get there. A couple easy examples I can give you is that we won’t build the processor or memory that goes into our first exascale systems, we will partner with other companies for that. Our core operating system kernel in the system will leverage Linux and we’ll be working with the user community and efforts such as the International Exascale Software Project (IESP) to get agreement on programming models and frameworks for application resiliency.

TER: On the topic of global cooperation, so much exascale development effort seems to be focused on Europe with various collaborative research labs. Why Europe?

Ungaro: I can answer this from a Cray perspective, and that is that Europe is a very fast growing and important market for us and they organized, through initiatives such as PRACE, very early in the game when other countries, including the U.S., were still in planning mode. So it was an easy decision as we wanted to tap into the incredible talent-base within Europe and also grow our own R&D efforts there. Cray is a global company and it would be both a bad technical and a bad business decision to ignore what is going on outside of the U.S. as we build our company. Our European Exascale Initiative was launched in 2009 and has continued to grow in both breadth and depth of the projects that it is working on. Of course, we are working on similar efforts with a number of other countries and institutions all around the world that will be complementary to our European initiative.

For related stories, visit The Exascale Report Archives.