By providing a fundamentally new and powerful technology, plus the tools to operate and program it, Micron is providing developers and customers an entirely new way to power their innovation,” said Paul Dlugosch, director of Automata Processor development for Micron’s compute and networking business unit. “One of the most challenging problems facing the developer community today is programmer productivity. In many cases, productivity is lost as developers work to identify and implement high levels of parallelism on conventional architectures. The Automata Processor and SDK will provide a new alternative for implementing very high levels of hardware parallelism without the complexities associated with von Neumann-style architectures.”
insideHPC: Paul, I understand you got something very exciting to show us. What is this new thing and what does it do?
Paul Dlugosch: This is Micron’s Automata Processor. The basis of the Automata Processor is really to accelerate graph processing. That’s really the simplest way to describe this new kind of processing technology that we have.
insideHPC: I know Micron as a company that makes memory. What are you guys doing making a processor?
Paul Dlugosch: That’s a great question. Memory happens to be useful for many things. It’s just only now that manufacturers like Micron are beginning to explore alternative uses of memory. In this case, rather than using memory as a read write device which we all know, we understand, we get it. The memory is actually the central core of a small processing element. The vertex in a graph is how we use the memory device. And the memory, rather than storing information, is actually matching information on an input data stream. So, we’re able to construct large graphs in the chip and process them very quickly and efficiently.
insideHPC: But when you get down to the nitty gritty, what this is, is a PCIe Accelerator that plugs into a traditional X86 server. Correct?
Paul Dlugosch: That’s correct. The Automata Processor is not designed to run an operating system. It is designed to have complex, unstructured data written to it and to perform graph processing on that data, and provide analytic results back to the user or to the host system.
insideHPC: Tell me more, because we got one here. Is this a functioning thing that you guys are showing in the booth today?
Paul Dlugosch: Yeah, it is. This is our first PCIe Accelerator, it’s a development board. This particular one has 48 Automata Processors on it. You might be confused because they look like standard SODIMM modules. And in fact they’re designed to be interchangeable on this board with standard SODIMM modules, interchangeable with DRAM in other words. This board is being used and we’re showing demonstrations and simulations right now of the board performing processing in applications such as bioinformatics, cyber security, associative rule mining. In other words, big data analytics. And so that’s some of the demos that we’ve set up to show here today, and are giving people an idea of really how this new kind of processing technology can tackle these really big problems.
insideHPC: Help me decouple this from some other announcements that you guys have made about future technologies. There’s no hybrid memory cube here?
Paul Dlugosch: In this particular version of this board there is not. One of the things about the Automata Processor is that it really operates for the most part without needing supplemental memory, okay?
It’s an interesting byproduct of this kind of processing, where conventional compute systems to perform this kind of processing sometimes need very large amounts of data. When the Automata Processor is using these applications, the need for external memory goes down substantially. And the need for high performance memory goes down substantially. So the Automata Processor is really doing the heavy lifting in the system.
insideHPC: We all know about graph processing and it really could beat up a traditional processor, right, and a lot of data and everything. It’s pretty painful. How big a data sets are we look– what are the capabilities of this thing?
Paul Dlugosch: Well each Automata Processor has the capability to perform about six trillion match operations per second, that’s one device. On a board like this where we have 48 of them, you can see how that scales up quickly. It’s all about the massive parallelism that is always, always existed in a memory device. The DRAM that’s in your cellphone today is a tremendously parallel device. However the way it’s used in systems, we aren’t really able to exploit the massive parallelism and the capability of that DRAM device. We’ve unlocked it with the Automata Processor.
insideHPC: Well for those that might know, I– describe a graph processing problem that this thing might do. I see you have an example here.
Paul Dlugosch: Well this is a very trivial example, but it serves to at least give people an idea about how the Automata Processor works. What we’re looking at right now is what we call the AP workbench. The Automata Processor workbench. It is actually an environment where designers can sit down, design Automata, almost the way that they would design a circuit schematic. They can drag elements onto the fabric here, and there is other elements. Counters, and logic gates. They can connect these elements, and then ultimately compile that design and load it. And what happens in the Automata Processor, is the device is configured to exactly implement this graph. Where we see lines between nodes, we call those edges. Those are actually wires that are connected on chip through a configuration, and this graph is directly and exactly imported into the device. Now the really great thing is, here we have one graph. It’s designed to find a couple of variations of the word donut. Designers, for example in cyber security, may have thousands of patterns that they want to run to secure their network. The Automata Processor will compile those thousands of patterns, load them all into the fabric and they all operate in parallel, without the user having to really even think about how to implement parallelism. It happens automatically.
insideHPC: If I’m a data scientist, then this is the kind of thing that I do for a living. Do I have to throw out everything I knew to use this device or how do they interface with it?
Paul Dlugosch: You don’t have to throw out everything you know. What you have to do is remember some of the things that you had been taught long ago probably in computer science class. Automata theory is not new. What is new is a hardware device now capable of running these kinds of machines, and doing so in a massively parallel way. So yes, users are having to unwind how they think about attacking a problem with a conventional CPU. How do I take a very parallel problem, and map it on to a multi-core architecture? We all know that’s hard. Programmer productivity goes down. Here, the parallelism comes for free, and you’ve got to now remember kind of your Automata theory. How would I look at a problem differently? We have some great researchers from Georgia Tech University of Virginia, where they’ve opened a research center on Automata processing. Some of the examples that we have here today, are actually examples and demonstrations developed by the academic community, who is really starting to embrace this technology.
insideHPC: If one of my listeners, or viewers I should say, sees this and their ears perk up, how do they engage or is there a way for them to kick the tires?
Paul Dlugosch: Yes. Well, there is. Just shortly before the show, we announced the general availability of our tool chain. If a user who’s interested would go to www.micron.com/automata, they would find a wealth of information. Our APIs are fully described there. The technical specifications for the product are described there. And maybe most importantly, they can download the tool chain, and actually begin doing these kinds of designs.
insideHPC: Wow, that’s really exciting. Is it shipping today? Can I buy this thing?
Paul Dlugosch: You can’t buy it today. We’re still on the process of fully bringing up the technology. In the first quarter of 20 calendar quarter of 2015, we expect to have this board fully brought up, ready to go, and to begin doing some of our first distributions to selected early adapters.
insideHPC: Well Paul, this is very exciting. I guess congratulations are in order.
Paul Dlugosch: Well, thank you. It’s been a number of years in development. We’re starting to be more public about it. We’re really excited about the prospects and the opportunities for this technology.
Check out our Full Coverage of SC14.