In this podcast from ISC 2016 in Frankfurt, Steve Pawlowski from Micron discusses the latest memory technology trends for high performance computing.
insideHPC: Welcome to the Rich Report, a podcast with news and information on high performance computing. Today my guest is from Micron. We have Steve Pawlowski. He is the VP of Advanced Computing Solutions at the company. Steve, welcome to Germany.
Steve Pawlowski: Thank you, Rich. I’ve been here many, many times over my career, so it’s always nice to be back.
insideHPC: Yeah. I think it’s great that we had a chance to chat. ISC starts in earnest tomorrow, but I thought we could use this opportunity to kind of catch up with Micron, what’s going on with memory, with HPC, with new kinds of devices coming online, new memory hierarchies, all this kind of stuff. So Steve, let’s start beginning. Who is Micron and who do you help in this space?
Steve Pawlowski: Well, Micron is a memory manufacturer. We manufacture DRAM as well as NAND and NAND SSDs and then NOR devices. Essentially, we help everybody. Anybody who needs a memory solution– memory’s a very critical component of every one of these systems that get built, whether you’re using Intel-based systems, ARM-based systems, GPGPU, they all use a common set of memory and that’s what Micron produced.
insideHPC: And to put this in perspective, the new TOP500 list comes out tomorrow. These are the biggest systems in the world, of course. And memory kind of comprises a big proportion of the cost of a system of this scale.
Steve Pawlowski: It does. Roughly, the general rule of thumb is about 30 percent of the cost of the system.
insideHPC: Thirty percent? Wow! And we’re talking millions of dollar systems so lots of money. And it also accounts for a good portion of the power consumption for the overall system, doesn’t it?
Steve Pawlowski: Yeah. It’s not only the power consumption of the devices itself when you’re actually doing a full access and lighting them up, but majority of that power is actually in the transfer of data from memory to the compute element to do whatever computing there is and then transfer that data back.
insideHPC: Yeah. So we’ve talked a lot in the past with supercomputing and architectures that the memory speed hasn’t kept up with Moore’s law, which is doubling the transistors and getting these speed increases. But memory has been a much more slower kind of scale, hasn’t it as it’s in the past?
Steve Pawlowski: Yeah. As the scaling continues, we get smaller and smaller DRAM cell sizes, just the basic DRAM, whether it happens to be a planar device or a deep trench device. And you’re right, the latencies have roughly stayed the same over the past several years. Now when I started my career at Intel in 1982, the memory latencies, access latencies actually dominated the CPU speed, so really, the CPU wasn’t waiting for memory as much as it is today. Now in a lot of sense, over the years, we’ve not done a really good job from the CPU side. In order to be able to compensate for the latency, we’ve added more and more caching, threading, things of that nature, to try to mitigate the latency issue. So there really hasn’t been a strong motivation for the DRAM vendors to drive latency down, as opposed to just getting more bits per cell and getting the cost on a per-bit basis down.
insideHPC: All right, Steve. Well, not to beat up on the industry because over that same timespan we just discussed, we are getting a lot more memory capacity than we used to for the dollar rate in the same amount of space. So that has been on a very pleasant curve, especially in HPC where you need more memory for multi-core processors. Let’s talk a little bit about the new technologies. What comes after the traditional DRAM we see today in a cluster? Is it DDR4 we’re at today?
Steve Pawlowski: We’re at DDR4 and certainly, the industry is looking at next generation DDR5. Whether that is a serial interface or the same parallel interface, those conversations are going on now. It could be both.
insideHPC: And that’s still a dynamic memory that needs to be refreshed?
Steve Pawlowski: Yeah. Even when you look at non-volatile memories, eventually, they need to be refreshed. Now they don’t need to be refreshed as often, but they will still need some type of a refresh cycle.
insideHPC: Okay. What about the 3D stack memory, the things we’ve been hearing about. Are those coming to market? Are they around the corner? Where are you guys at?
Steve Pawlowski: Certainly in NAND. The 3D memory, the 3D stacking is a more mature technology in that space, and yes, it is on the way to coming to market, and we’ll certainly see it in SSDs. DRAM is really not architected for something like that in terms of the process, and there are other technologies like XPoint that will still take some time. They’re going to come to memory at some point, but they’ll tend to start to show up more and more, but certainly, 3D NAND will lead the way.
insideHPC: Refresh my memory. You guys came to announcement with Intel. Is 3D XPoint the name of the technology and you both have your implementations of it coming? I think they call theirs Optane. Did I get that straight? And yours is called something else?
Steve Pawlowski: I don’t know what we’re calling ours because I’m kind of at the head end [laughter] of a lot of this stuff in terms of what we do in the next three to five years. But yeah, it was jointly developed, and Intel has its product roadmap that they’ve either announced or they will announce at some point in time. And we’re certainly looking at products that leverage and utilize the same technology.
insideHPC: Do these things allow you to kind of architect a supercomputer in a different way than we traditionally have done?
Steve Pawlowski: Well, they do. As I mentioned, I spent a lot of my career at Intel. I left in 2014 and joined Micron. For years, we looked at the new non-volatile technologies that came out, like NAND and whatnot, in order to see, is there a way that we can potentially find a use for that in the standard memory footprint? It was difficult because of those latencies. When you look at a technology like 3D XPoint and some of the new materials the industry is looking at, those latencies are becoming more DRAM-like, which makes them a more attractive option to look at. Is there a way we can actually inject persistent memory that’s fairly high-performance so we don’t take a performance hit but we can certainly increase the capacity on a cost-per-bit basis versus what we have today?
insideHPC: Yeah. So you have these various tiers. Cache, closest to the processor and going out to these things. What’s your mission this week? What kind of things are you trying to share with your customers? Because they buy these things three years out, right? These plans, big supercomputers, right?
Steve Pawlowski: There’s obviously the mission to sell the products that we have, and the products we certainly have in the pipeline, which are mainly DRAM-type products and SSDs. What I do is, and when I came to Micron, it was really focusing on the convergence of computing and memory, and that isn’t necessarily, “We’re going to move the CPU to a memory process.” They’re just two different things, and that’s not necessarily the best way to spend your dollars. But is there a confluence when you take an architecture like hybrid memory cube, where memory and logic are very close together to minimize energy, and then we can optimize the memory architecture. Because the interface between the memory and the logic isn’t a standard interface, where we can optimize that architecture for highest bandwidth, highest parallelism in terms of access to get as much efficiency out of that interface as possible.
Then with that logic layer, you can start converging and bringing some compute structures in that. It’s not just a controller like HMC is today. Potentially, it can evolve and be more of a computing platform, with logic and memory. There’s some things you can do in logic very well, like floating point. Very, very efficient operation. But things like scatter/gather, we can potentially do that in the memory and get that information available, or pointer chasing, sort. There are certain types of functions we can actually do in memory more efficiently than bringing that data in the process or doing a few bits of compute and then pushing it back out.
insideHPC: Yeah. Because every time you move the data, as I say, there’s joules involved. More than the joules involved of computing the thing, processing it and moving it takes a large amount of energy.
Steve Pawlowski: Large amount of that. The interesting thing is when you do an access in memory, just a standard bank 16,000 bits, so 16 kilobits are activated. And generally speaking, they’re not bringing two kilobytes of information in per every access. So a lot of that information is read, they’ll bring in 6,428 bits or whatever it happens to be in and then the rest of it gets written back out. And that’s essentially lost energy.
insideHPC: And when you multiply this times thousands of cores or the new Chinese machine that they’re going to announce very soon here, it’s over 10 million cores, it’s mind boggling.
Steve Pawlowski: It is mind boggling [chuckles].
insideHPC: But a lot of memory involved there as well…
Steve Pawlowski: And I feel sorry for the software people that have to program too many [laughter] cores.
insideHPC: Yeah. I think the next largest one on the Top500 has 400,000 cores or something like that. So the moral of the story, memory, it’s the bread and butter, right? And keep it close to the processing, don’t move the data around. But it sounds like a lot of exciting things are coming around the corner. As Moore’s law seems to be flattening out, there’s other ways to speed up the machines.
Steve Pawlowski: Certainly. Yeah. And it’s really focusing on the energy, and for years, as the industry has done more and more multi-core systems, and adding more cores to the logic, capacity per core and bandwidth per core have not kept up. Now if you look at architectures where you assume– if we take the assumption that– let’s assume that every compute node is four cores, and it has it’s own stack of memory. And so on a per core basis, it has consistent bandwidth and consistent capacity.
Now if you take those memory logic components and you add a million – so you have four million cores in that case – you’ve increased the memory capacity by four million, and the bandwidth per core hasn’t changed. So if you can come up with an optimal solution with one brick, and you can expand it to a million, you still allow the scalability. Because as you add more cores to the system, you’re just– if we live with standard types of memory interfaces or we go do boutique-type solution, they’re expensive, they’re power-hungry and you can still see how bandwidth and capacity per core have not kept up on them.
insideHPC: They’ve not kept up. Before I let you go, I do want to ask about the Micron Automata processor. Last I saw it last year at Supercomputing. Just talking about it. Is that coming close to market or are you still showcasing that as a potential technology?
Steve Pawlowski: We actually have working silicon, and we have working boards, and we’re focusing on– things like Automata which is, even though it’s in a memory process, it really is a processor. So it is not a memory device. Now it falls in that class like that GPGPU did a few years ago being an accelerator. And so a lot of software has to be developed to be able to allow that to exist inside the system. And because it’s a different programming paradigm, because it is a very parallel machine with tools that have to be able to take whatever that algorithm is and map it on that fabric. We’re going to be spending more time working with the research and development community in terms of taking those devices, focusing on the algorithms that would map to it, and then that’ll inform us on where we take that architecture going forward.
insideHPC: It’s pretty exciting.
Steve Pawlowski: Its performance is actually a lot better than I would have anticipated for the workloads that it’s really good for. It’s not a general purpose computer by any stretch. But if you got something, it fits on there right, it’s a great potential.
insideHPC: That’s cool. Well, Steve, I want to thank you once again, and it was great to have you on the show today.
Steve Pawlowski: Thank you, Rich. I’ve enjoyed it.
insideHPC: Okay, folks. That’s it for the Rich Report. Stay tuned for more news and information on high performance computing.