May, 2012
Toward Power-efficient Exascale Systems
In this article and audio podcast, we discuss Near Threshold Voltage (NTV) processing with Vivek De, an Intel Fellow and Director of Circuit Technology Research at Intel Labs.
NTV processing is a research area that holds tremendous promise for more efficient power management, and is applicable to numerous future computing applications ranging from mobile applications to HPC, and is likely to be one of the critical technologies required to enable power-efficient exascale systems.
This article is not meant to be a literal transcriptof the audio interview with Vivek De. The author has taken certain liberty to publish this article in the best way possible to capture the essence of the interview.
The Exascale Report: When we talk about Near Threshold Voltage, or NTV, what is “threshold” and what do you mean when you refer to “Near threshold?”
Vivek De: So, the threshold voltage of a transistor – which is essentially a switch, is defined as the voltage at which it turns on. So if you change the gate voltage to go well over threshold voltage, then the transistor is strongly on, and if you bring the gate voltage well below the threshold voltage, the transistor turns off completely.
One of the ways you want to improve energy efficiency of computing is to reduce voltage.
The highest energy efficiency of the computing happens when you operate the transistors at a
voltage just above threshold – as opposed to operating them well above threshold. We call this Near Threshold Voltage computing because the transistors there are operating at the most energy efficient point of operation, which happens to be just above threshold due to a variety of factors.
TER: So this is the opposite of what I thought when I heard this term. When I heard “Near Threshold”, I thought it implied that you are tweaking these up as high as you can, not lowering them.
Vivek De: Normally, when you operate transistors in microprocessors and other designs, they operate well above the threshold voltage of a transistor which is going to be a volt or a half a volt – that’s a typical operating voltage of the designs you have today. So then the transistor turns strongly on which means it provides high performance but it’s not necessarily the most energy efficient. So, by taking the voltage all the way down to near threshold while keeping the transistor on, gives you a way of getting the highest energy efficiency possible out of the transistor that’s designed in the circuits.
TER: So this is an active research program at Intel Labs. How long have you and your team been looking at this?
Vivek De: The genesis of this was about five years back, as a research project – as a research concept. Our designs had been operating at well above threshold for a long time and we were trying to push that voltage down over the years, but clearly not all the way to Near Threshold which happens to be at around 300 millivolts or so. We knew that if we could get to Near Threshold we could really improve energy efficiency even more than what we have, but the challenge is getting there.
So we started this research project a few years ago. We asked what are the different challenges to get there and what are the different benefits if you actually are able to operate the transistors at near threshold voltage. And then systematically we tried to address the key challenges and we have had many test chips and prototypes and research test chips to get the learning so that going forward we can make it happen in designs all the way from deeply embedded computing platforms to regular compute platforms as well as HPC and exascale – it has applications across the board – across a whole range of platforms and workloads in computing.
TER: Could you summarize some of these benefits for us?
Vivek De: The main advantage is energy efficiency. You measure energy efficiency as operations per watt, or Picojoules or the number of joules per compute operation. You want the operations per watt to be as high as possible, and the amount of energy consumed per operation as low as possible.
Clearly with scaling voltage you are reducing the energy consumption per operation quadratically – the square of the voltage, so clearly going to Near Threshold, the main benefit that you get is that you are pushing the limits of energy efficiency for your compute and the other components in your platform.
TER: And clearly this is a huge discussion from mobile devices all the way up to the largest systems.
Vivek De: Yes, energy efficiency is paramount for everything you can think of – all the way from deeply embedded computing platforms – to HPC and exascale. So, if you make it happen, it benefits everything across the board. It’s not limited to any particular segment.
TER: We saw a video from IDF where Intel demonstrated what I believe was referred to as a solar-powered NTV processor. Did I understand that correctly?
Vivek De: Yes, that was a concept processor. The solar-powered part came about as part of a clever demo. The processor was the first X86 IA processor in Intel, or anywhere in the world, that operated at Near Threshold Voltage. That was a big research prototype goal. We have done NTV in the past for small compute engines, accelerators, or other compute blocks – piecemeal, but this is where it all came together – a Pentium class full X86 processor design where the Intel processor was operating at Near Threshold Voltage. Not only at Near Threshold, but scaling all the way up to full voltage to span the range of power and performance that can be accomplished.
So, once we had this in the lab and saw that it operated at 3 milliwatts at Near Threshold for a full Pentium processor – which is an astoundingly small number. So to make that point more emphatically we realized this could run from a solar cell – that it didn’t need a full power supply and it didn’t need a Lithium battery to operate. So we discussed powering this with a solar cell would make the point that is it is so energy efficient – that it doesn’t even need a battery or a power supply to run – to run a full Pentium class processor core, which was the rationale behind the demo – to make that point emphatically.
TER: So today, this is still a concept as far as processors go?
Vivek De: Yes, the demo was clearly a research prototype and it was a concept demonstration. Clearly, to make it happen in high volume manufactured products there are a lot of challenges that need to be addressed and so the research is continuing – and some advanced develop-ment is continuing to determine how we solve all those challenges to really make it happen across the board in a wide variety of platforms and in a robust, high volume manufacturable way.
TER: So can you give us some idea of when we might see this type of processor on the market?
Vivek De: I can’t address the timing. In the research lab, we have a portfolio of technologies that address the key challenges and we engage with the various product groups to go through path finding and product development and eventually the product groups will make the decision as to the right time and the right use of these technologies, and what is the right timeline as well, so I don’t have any specific estimation as to when that may or may not happen.
TER: But it is starting to look like a reality that we may see this?
Vivek De: Well, it has come a long way. If you think about it, when the research started as a concept five years back, it clearly has come a long way that you can have a full IA processor booting an OS and running workloads from a solar cell, but there is still some ways to go. All the problems have not been solved to make it actually happen in a real product or in a real compute engine for HPC or exascale – and that’s one of our key objectives is to drive forward the research and the engagements for the product groups to really make it come true in a real product.
TER: So let’s discuss what kind of implications this might have for real world computing applications.
Vivek De: Well, there are tremendous implications. If you look at all compute platforms, from deeply embedded to handheld and tablets, laptops, desktops, to HPC servers to exascale, they are all limited by energy efficiency. If you can boost energy efficiency by 3, 4 or 5x, there is no doubt about the benefits to be gained. Not only does it allow us to improve and have leadership capabilities in existing platforms, but it also allows us to enter new markets where we have not yet got IA-based compute engines, especially deeply embedded or areas like that.
The other thing it provides is the capability to provide performance on demand. For example, if you design a processor with Near Threshold Voltage capability, it’s not going to always operate at NTV. One of the things that happens when you reduce voltage is the performance goes down – the frequency goes down. So clearly for responsiveness, vs. performance requirements, you’re not going to operate the processor at Near Threshold at that time. But being able to operate at Near Threshold Voltage when needed, you are providing this choice of capabilities – so when you need high performance – you have it at the time you need it and then when you are doing something that doesn’t need that tremendous performance, you can operate at Near Threshold to get the best energy efficiency for that performance demand, and provide the user experience that you need.
TER: So this is like on-demand resources – but specifically, on-demand power?
Vivek De: Correct – for the performance you need. Providing the performance the workload demands in the most energy efficient way by using voltage and frequency as a knob, but also push them down to the ultimate energy efficiency point when the performance is suitable for that.
TER: So, in very large systems with thousands or tens of thousands of processors, the implications here could be tremendous – because now, when those resources are sitting idle, they are still using full power consumption.
Vivek De: Correct. For idle power management, one of the ways we do idle power management today is to turn things off. So, when you are not using something – a component or a core – you turn it off. And that saves power. But when you want to go from that state to servicing something that is coming up, it takes time to wake up and service that. And that latency sometimes is not tolerable. Sometimes you say, well, if I go into sleep, it will take me time to wake up, and I may not be able to wake up in time to service the next workload, so I should not go to sleep. So, a lot of the time, we waste the opportunity to go to sleep even if we’re not doing much at that point.
By enabling this NTV mode, I can have a parking state, not a sleeping state. It’s a half awake state. We operate a very low voltage and frequency because we’re not doing anything major that is performance demanding. And so, I can wake up quickly – because going from NTV to higher voltage is much quicker than going from off to on. So we can provide a better overall performance and energy efficiency by having this NTV-enabled half awake state that can respond quickly on demand to whatever comes up – and wake up fully and service the next workload.
TER: Where would you see more potential benefit from this type of technology – at the mobile computing end or at the very high end?
Vivek De: Applications of this technology at the mobile and embedded computing end would be to get the peak energy efficiency point , service variable workloads, wake up quickly, and things like that. Now for HPC and eventually exascale, it’s another benefit that is not fully exploitable in other applications. And one of them is that – if you are interested in throughput – like exascale throughput – you can achieve that by taking a few components operating at the high voltage for the maximum performance point and get that throughput. So when you have highly parallel applications the other way that we can service that is have many, many components, each operating at a lower frequency, but in parallel they are providing the throughput that the workload wants. For that, you have to have applications that can exploit parallelism, and that obviously is true in the HPC and exascale segments.
So in that case – what you would do is – instead of using tens of cores, you would use hundreds of cores, and operate all of those at NTV to provide exascale kind of performance, while for each compute operation that you are doing, you’re operating at NTV, so you are getting the best energy efficiency. So it satisfies both the throughput requirement of exascale which is very high of course, and also the energy efficiency demand.
TER: So, Near Threshold Voltage..the potential benefits sound tremendous. Why haven’t we done this already? What’s holding us back?
Vivek De: So, while we’ve solved a number of challenges in the research phase, there are still some challenges to be addressed to deploy this as a product or in high volume.
The number one challenge is – when you scale the voltage down to that level, you are more susceptible to variations. Variations in transistor characteristics – that happens when integrating a large number of transistors on a die, process variations die-to-die, and we are trying to reduce that by manufacturing technologies, but there are still some variations we can control but they become amplified at low voltage- their impact are excessively magnified when you get to Near Threshold Voltage. So, we need to find ways to design techniques to make the circuits more robust at Near Threshold. They are sufficiently robust today in the voltage range we operate at, but if you want to push them to this voltage, although in the lab the robustness is not that critical, because its only a few chips that you are measuring and characterizing, but if you wanted to make a hundred million of those, the robustness of all of the chips are critical.
So, we need to make the circuits more robust. That’s the number one challenge, and there are techniques we are investigating to make that happen.
The second is – to supply that low voltage, you need a voltage regulator – power supply – that generates at low voltage efficiently and provides a range of currents that you deliver to the chip in a very efficient way.
The third is soft error – especially for reliability. So as you scale voltage down to those kinds of values, the soft error rates and other marginalities and other noises that are intermittent failures – those rates can go up. You need to have technologies deployed, not only at a circuit level, but also across the stack of hardware and software to keep those error rates at a reasonable level and also make sure the error rates are not so high that it impacts the running of software for operation of the system.
So, a combination of these things has to happen to make a reliable system platform operating at NTV.
And the fourth area is designing these kinds of processors at NTV. You need to modify the tools in important ways to get timing closure across a wide range of voltages. We typically do timing closure at the one or two volt range because that’s the range of operation. If you have a wide range of operation of voltage at the timing closure, you need to comprehend not only the things that change at one voltage but also across a range of voltages including NTV. So, design tools and technology to handle all of those is required to make this happen in a product and in a large system.
TER: So you must have a rather large group working on this area of research?
Vivek De: In the circuits lab here, we are focused on the circuits aspect of it, but we are collaborating with groups across Intel labs, and other business units and product development teams to collaboratively work on these challenges and see how we can take it forward.
TER: This is one of the most exciting areas of research that we’ve talked about in a long time.
Vivek De: Yeah, I’m pretty excited about it.
TER: Now you recently published several papers on NTV at the International Solid State Circuit Convention, can you tell us more about that?
Vivek De: Sure. We actually had three key papers focused on NTV at the conference. The first one was the Claremont paper. We showed the NTV IA processor that detailed all the technical aspects of the work that enabled the NTV operation of Claremont. It showed about 5x improvement in energy efficiency for an IA processor by operating at NTV. And it also showed a wide range of operation – from 280 millivolts all the way to 1.2 volts, and the performance and power, so we had also deployed some of the design technologies to timing closure to properly handle the variations and design the circuits to be robust at low voltage, so we detailed all those technology accomplishments in the ISSC paper.
The second one was a paper that talked about a SIMD engine. It was not a full processor but it was two-dimensional permute engine that could do any-to-any shuffle in a two-dimensional matrix. It was built on a 20-nanometer CMOS process, and it also showed operation at NTV, and it also had many new circuit techniques to provide robustness at NTV, and demonstrated 10X energy efficiency improvement for the accelerator block from a nominal operation point.
The third paper talked about how to scale the voltage of on-die SRAM or register files. So, there are logic circuits in a compute engine – in an SOC or microprocessor, and there is on-die memory which is SRAM, so, register files for storage, not storage in the hard drive storage, but just storing data. Volatile memory. So, one of the challenges of pushing the voltage down, especially to NTV, is that all the circuit technologies that you use to accomplish that cost you some area to increase the size of the circuits, and as a result of that, capacitance that you switch also increases somewhat. So, if you push that to the extreme, those over ?? are not tolerable. So the goal here was to determine how to scale the voltage without paying too much area cost in the process.
TER: It seems like we’ve made so much progress, yet any of the barriers that are out there could be monumental. What do you see as the next steps or the next milestones that could bring us closer to reality with this technology.
Vivek De: I think the three vectors that I mentioned. One is robustness. The second challenge is voltage regulators, and the third challenge is this: If you use tens of cores or hundreds of cores operating at NTV, inter-core communications is very important. Clearly not all workloads will have data local to the compute. So, the more parallel you go, data locality is more important, and inter-node communications efficiency becomes very important. Your concurrency is limited by some of those. So you need to have a very efficient on-die interconnect fabric to enable NTV operation. You can go to a hundred cores, but if the interconnect fabric is not efficient, they you can’t really fully exploit the capability of the hundred cores operating at NTV.
So that’s another challenge at the on-die interconnect fabric level.
And then challenges related to errors – intermittent errors.
Once you have robustness against process variations by design, there will still be soft errors, there will still be noise induced errors which are infrequent and intermittent. And to be able to operate at NTV, those errors become larger – the error rates are higher and you need to have a mechanism of detection and response across all layers of your platform, hardware, software, applications, OS, – everything – to really make sure those higher error rates are not impacting the performance or the reliability of the platform.
So, it spans a broad range of challenges but we have a line of sight to the solutions, but now, we just don’t know how to go about attacking those problems. But you have to figure out which are the right ones – which are the right solutions to solve the problems in the right way.
TER: So, is NTV one of the huge steps necessary to get us to exascale within an acceptable power range?
Vivek De: Well, this is one of the key components. There are many, many other challenges obviously – there’s resiliency, concurrency, locality, algorithms, software, compilers, all sorts of things that have to come together to make that happen. But clearly on the circuit and design side, if you think of one thing that’s absolutely needed to make that happen, there is no doubt in our minds that it’s NTV.
One of the ways you want to improve energy efficiency of computing is to reduce voltage.
By enabling this NTV mode, I can have a parking state, not a sleeping state. It’s a half awake state. We operate a very low voltage and frequency because we’re not doing anything major that is performance demanding.