Renowned HPC System Architect Alan Gara Talks About Exascale Pathfinding

Print Friendly, PDF & Email

Renowned HPC System Architect Alan Gara Talks About Exascale Pathfinding
Confirms Intel’s Vision for Co-design and Fabric Integration

Among a very elite group of HPC experts, Alan Gara is widely recognized and respected as one of the HPC community’s true visionaries and authentic leaders. He not only has a good grasp on what it will take to build a system capable of exaflops, he also directs a world-class technical team currently researching and driving every aspect of exascale development. And it doesn’t hurt that he happens to be doing this under the Intel umbrella where HPC has become somewhat of a keystone initiative.

Download this story as a PDF * For related stories, visit The Exascale Report Archives.

Gara is of course best known for his role architecting the IBM BlueGene architecture. He joined IBM in 1999, was named an IBM Fellow in 2006, and left to join Intel in June 2011. He went from leading IBM’s exascale system research efforts to his current position as Intel’s Chief Exascale Architect in a group Intel refers to as “pathfinding” – an area that falls between research and products.

At Intel, he is once again focused on critical high- end scaling issues such as low power operation, innovative cooling, resiliency, emerging memory technology, the next generation of interconnect technology – everything that will need to come together to form the future architecture of exascale.

The Exascale Report is pleased to bring you this feature interview with Intel’s Chief Exascale Architect, Alan Gara.

The Exascale Report: I’m curious about the title of “pathfinding.” Does this hold a special distinction within Intel? How is pathfinding different from research?

GARA: I think it is fairly unique to Intel. I’m not certain of that, but I’m not aware of it being used in another organization this way. At Intel, it has a well-defined meaning. Pathfinding is very much what it sounds like. It is an early stage in our product development process where we define the high level direction that we will be taking the product. As it is part of the product development process it has a well defined schedule and set of completion criteria. It’s a stage of product development where we set the high level direction for the products. Research, which comes before pathfinding, is more unconstrained in terms of the timescale as well as the areas that can be explored. In research, we take some high risks and we don’t always anticipate that all research projects will turn into products. In pathfinding, we exit with a clear direction and necessary elements of product direction.

The Exascale Report: At the recent symposium held at Argonne to celebrate thirty years of parallel computing, you gave a presentation titled, “Application Challenges Resulting From Technologies of the Future” I find this title quite intriguing. It seems that, in order to understand the Application Challenges of the next 7-10 years, we need to have a handle on what the new technologies might be that developers will have to work with. Yet it seems like almost every aspect – every echnical details – related to the technologies of the future, particularly with exascale are up in the air right now. Have you determined some approaches that application developers could actually be using today to insure they have code that will be scalable on exaflops machines?

GARA: It is true that we anticipate that technologies will play an important role in defining systems of the future and correspondingly the things users will need to do to extract performance. This is not as new of a direction as it might sound. Users have been adapting to the realities imposed by system architectures for a long time. The most obvious example is our inability to continue to increase frequency has resulted in users needing to exploit much larger degrees of concurrency.

There are a number of branch points in the future that technology will drive. There will be things that will dictate if we go one way or another, and none of us can predict today which way it will go. However, there are some things that we do anticipate and we know will be there as part of all possible directions for exascale. It’s really more a question of degree. Some technologies can sort of make the day – and make the switch easier.

One piece of advice that I like to give to users is that they should really be focusing on threading their applications to try to enable them, from a system architecture perspective, to exploit as much performance as possible from a finite amount of memory effectively – or a finite state of their problem. And the reason that’s important is that memory itself is such a big swinger in the whole picture. Right now if you look at current systems and the way they are balanced, the amount of silicon dedicated to memory is actually quite high but it’s also not scaling as fast as we are scaling performance of the compute. And we see that issue as getting harder and harder. It’s already skewed quite a bit. There’s already much more silicon involved in memory than there is in the processor, even when you take into account the difference in the cost of wafers a, it’s still skewed considerably. Therefore, as we go forward, we can’t just assume that the memory scales at the same rate as compute performance. Or, if we really do get one of those revolutionary new memory technologies that come in and have the right bandwidth characteristics, resiliency characteristics, etc., then maybe we don’t have to push quite as hard on that dimension. But in any case, you will have to thread, it’s just a degree of how much and while new memory technologies can really help alleviate some of this, it won’t completely eliminate it.

We need to transition our thinking from energy efficiency at the transistor level – to energy efficiency at the system level.

One interesting thing about the technologies of the future is that while in the past we have often grappled with what we could no longer do, in the future we see technology areas which open up the possibility of being able to do things that we could not do before. An example of this is in the area of memory technologies which will potentially allow us to turn the clock back a bit on the tradeoffs between bandwidth and capacity. We currently can’t have both which results in the layered cache hierarchy. Some of the new memory technologies allow us to change this constraint somewhat. I am not suggesting that we will be able to eliminate caches entirely but some of these technologies do have the potential of simplifying it somewhat.

The Exascale Report: Considering all the technical breakthroughs we need in order to reach exascale, how important is a new memory technology to achieving this goal?

GARA:Achieving an Exascale will be an amazing accomplishment which is likely to initially be focused on solving important highly scalable problems. The biggest challenge to reaching Exascale is to do this in a manner that enables accessible performance, reasonable total system power, high reliability, reasonable cost. And… to achieve this in a reasonable timescale. We know how to do each of these in isolation but doing all simultaneously represents the real challenge.

Memory comes into the Exascale challenge in a number of ways. The most important dimension is energy efficiency. This is more a memory microarchitecture innovation as opposed to a fundamentally new physical device. For us to achieve Exascale we need to dramatically reduce the energy that is needed to access memory. Of course there is also the possibility that new device technology could also help energy efficiency. Right now though…. most of the energy associated with memory is not attributable to the actual physical memory cell.

New memory technologies are extremely important for the future. We know that the scaling of the physical DRAM device is getting much more difficult going forward. We have been struggling for some time with memory density improving at a much slower rate than we are able to increase compute performance. This has put extreme stress on users and without new memory technologies this skewing will continue. We already have much more silicon area in the memory than we have for the compute. We either find a new memory technology that eases this pressure by allowing for significantly higher densities or users will feel the pinch even more as we go forward. We want to build machines that are at least as usable as current machines so moving these memory technologies forward is really critical.

The Exascale Report: Are there any emerging memory technologies that you find particularly promising?

GARA: There are many that show incredible promise and I would find it hard to bet on any one horse right now. We expect to see a lot of experimentation at the system level of these new technologies. They each have their strengths – their own attributes in terms of performance, resiliency, power, etc., but key drivers are capacity per dollar of memory and bandwidth per dollar – power is really the fundamental challenge we face in getting to exascale. It’s very high on our list of focus areas. There are also new memory technologies that are able to be integrated directly on the compute die, and there are some that are not. We could very well find that some of the options that could be integrated are not really optimal in terms of performance – or capacity per dollar, but because they can be integrated, they bring a different value, so we may want some of these and some other new memory technologies to deal with the capacity problem we are facing.

Accelerators like GPUs have been fairly difficult for the community to use. They have been explored in HPC for more than a decade and there remain very few production codes which have shown better performance.

In other words, there may be multiple new technologies that emerge with each finding their place within system architecture. Some may win in the high capacity , best $/bit area such as is needed in far memory and burst buffers for file systems while others may emerge as viable high density solutions that can be integrated into the same die as the processor core.

The Exascale Report: How about Near Threshold Voltage research? Is this part of your research domain – and is it yielding promising results for exascale?

GARA: Near Threshold Voltage really offers an opportunity to get significant improvements in energy efficiency at the transistor level. So yes, it plays a very important role and we’re looking at near threshold carefully. But as in all things, there’s no free lunch here. Near Threshold Voltage comes at a pretty significant decrease in the performance of those devices. The amount of silicon area you get per device and the performance of that device both go down. The reality is – what we really need is energy efficiency at the system level.

And since energy efficiency is probably our biggest challenge this is a very important part of our research. In assessing these technologies we need to take a broad system view. It is not just the question of how efficient is a single transistor but really how efficient is a system for real applications that is built out of such transistors.

In other words, we need to transition our thinking from energy efficiency at the transistor level – to energy efficiency at the system level.

When we explore the question of ‘Does Near Threshold Voltage show promising results for exascale’, getting to an answer is much more complex than a simple yes or no as it makes assumptions as to what user applications will look like in 5 to 10 years. We know that if our only requirement was to build a system that could achieve 1 EF/s for a simple code that we would be able to do this by the end of the decade. But we would not want to build a machine that is not highly usable so the degree that we push in directions like near threshold voltage is tempered by this. The long term answer to this will be that we can operate in many different domains, we will be able to operate at very low voltages when the application can exploit extreme levels of parallelism and we will be able to also operate in a mode which is optimal for algorithms that have far less parallelism. As an architect my job is to make this as accessible as possible to the user and where viable, hide this complexity.

When we look at this from a system perspective, we have to look at many more things – at the device level – it’s the algorithms. There isn’t any one answer. There can be multiple answers depending on your algorithm. Maybe you have an enormous amount of concurrency and the frequency doesn’t matter. What really matters to you is you just want to run nearly an infinite number of threads – and in that case, Near Threshold Voltage could be exactly what you are looking for. On the other hand, there are parts of algorithms that we typically see that don’t have that behavior where at least for some time period, the performance of a single thread is a limiter. As a result, I think the real answer here is that we need to make sure we can provide devices and architectures that will allow us to do both and use the right one at the right time and so we are working on techniques to be able to do that both within a single device and core as well as in more heterogeneous types of architectures. And there are again plusses and minuses in those two approaches. But I think we need to maintain that level of flexibility in the architecture because trying to assume that we can drop the frequency by 10x and still continue to scale performance would be naïve. While things are moving quickly in the right direction, in that perspective, it’s going to take a long time before frequency doesn’t matter to the majority of applications.

The Exascale Report: One industry luminary was recently quoted as saying, “All HPC systems in the future will have GPUs.” Would you agree with this comment?

GARA: The industry has a long history of absorbing things that were one day considered accelerators into part of the baseline architecture. One example is the floating point units. These used to be add-on accelerator devices much like GPUs are today. So in that context I would not be surprised to see some aspects of what we currently think of as GPUs as baseline features. On the other hand, accelerators like GPUs have been fairly difficult for the community to use. They have been explored in HPC for more than a decade and there remain very few production codes which have shown better performance. Some of this is due to them not being integrated more closely into the processor. You can see the trend that GPUs are going to be integrated more tightly with a processor. I don’t expect that most systems will be built with add on cards similar to how GPUs are configured today; this direction would likely continue to have power and performance challenges. Valued features/concepts will be integrated into a CPU where it makes sense. As we have more transistors available in the future, integrating accelerators is a viable direction that we are exploring but they need to have enough of an application reach to justify the silicon area.

If the US does not aggressively invest in HPC, the country could find itself in a very tough position.

If you look at the GPU roadmap, I think what you’ll see is GPUs morphing to CPUs in a lot of the things they are doing in trying to deal with the fact that they are just too far away from a general purpose processor.

The Exascale Report: Raj Hazra recently talked about fabric integration and the critical importance of Co-design, and not just in a vertical path, but also as sideways – or horizontal co-design as being key to achieving exascale. Is your group responsible for co-design strategies? What can you tell us about progress in this area?

GARA: Fabric integration is one of the natural next steps we need to go to. We need to get to it to deal with the latencies and performance levels and cost levels that are necessary if we want to stay on this exponentially proving curve of performance. I think we at Intel have made a great deal of progress in that area – including some of the recent acquisitions, so we certainly take this very seriously and I anticipate it to be part of our roadmap in the future.

Co-design is fundamental to us being able to build systems that are usable, cost effective and power efficient. All the systems development efforts within Intel are very focused on this. We have made a lot of progress in this area and we can see it bearing fruit in our products. With the inevitable technology disruptions that will be adopted in HPC there is really no other way to effectively proceed and have any hope that what we are building makes sense. One big advance has been engagements with the government agencies where we are now involved in a number of programs – programs that provide critical feedback from the application experts on possible architectural directions long before the technology enters into pathfinding. This is extremely important to us as we need to make technology and architecture choices at least 5 years before products are generally available hence we are designing for application 5-10 years out

The Exascale Report: A number of spokespeople at Intel feel confident we will achieve exascale-level computation by the end of the decade. I assume you agree with this position, but what area of technology innovation do you see as a possible deal breaker?

GARA: Getting to exascale level computing by the end of the decade is certainly possible but there are a number of challenges that we still need to overcome. As I mentioned the biggest of these is probably energy efficiency. If one removes the energy efficiency constraint this becomes much easier to achieve.
There will need to be many innovations to pull this off. New memory microarchitectures are absolutely critical. Similarly we will be depending on continued scaling of our silicon technology. The last I would mention is silicon photonics. Supercomputers will be pushing the communications requirements. Without silicon photonics the cost of a highly usable system would likely be prohibitive.

The Exascale Report: As Intel’s Chief Exascale Architect, what is your perception of U.S. Technology Leadership today – and do you think the U.S. has any chance of being the first nation to field an exascale-class system?

GARA: It is really the U.S.’s to lose in some sense. There is no nation that is better positioned to achieve this. On the other hand we are seeing enormous investments being made into this area in many countries. We are in an era where HPC is really blossoming in terms of adoption. It is recognized by many countries as critical to their national competitiveness. A lot depends on how quickly the US government responds and how they get behind this. If the US does not aggressively invest in HPC, the country could find itself in a very tough position and it will be much harder to come back to a technology and computing leadership position.

The Exascale Report: As a community, what are we doing wrong, or what could we be doing differently as it relates to exascale research?

GARA: I think the U.S. emerging exascale community is like a family – everyone is in this together. I would not really call out any area where we are clearly doing something wrong. There are of course areas where this community could be doing better, there always will be, but the emerging Exascale community is a very close community thriving on very strong and widespread collaboration. One area where the HPC community sometimes struggles is with the definition of “goodness”. The idea of ranking supercomputers has had a dramatic impact on helping the community to focus on a concrete goal. But the time has come to change the way we measure these systems or we are at risk of pushing designs into a direction that does not make sense for general applications. This is being worked on and it is very important to keep the community focused but we need a metric that keeps us all moving in the right direction.

Alan Gara’s bio can be found at: Gara