The Open Compute Project got a major endorsement in the HPC space last week with the news of NNSA’s pending deployment of 7 to 9 Petaflops of Tundra clusters from Penguin Computing. To learn more, we caught up with Dan Dowling, Penguin’s VP of Engineering Services.
insideHPC: Is this your biggest Open Compute Project deal to date?
Dan Dowling: Yes, but the actual contract is for a series of computers. It’s not a single 7 Petaflop cluster. It’s actually a set of contract options that add up to $39 million. Of course, there’s options for more after that, but that’s the base award funded by the NNSA.
insideHPC: What would you say attracted them to the Open Compute Tundra platform? Was it the power efficiency or density?
Dan Dowling: Actually, they didn’t a full advantage of the density. The density on the OCP rack goes up to 108 nodes. But they opted for something like 96 nodes to leave room for a few rack units of switches and things.
They actually wanted to optimize around their data center infrastructure. So we optimized around a 24 kilowatt rack, which is one of the sweet spots for the 480 to 77 volt power they have. Their infrastructure allows them to come in at a higher voltage without down transformers. So that was part of it. Power density wasn’t, but power simplicity was.
With single power shelf we have nine rectifiers, we had it set up as eight plus one to deliver the 24 kilowatts per rack. So basically the whole rack has redundant power with one power supply. Nine elements in one power shelf. That was attractive to them — the economies of scale that you get by disaggregating power.
I think another piece that was very attractive to them was the whole “open” part of Open Compute. With the OCP platform, you can buy nodes from a lot of different vendors to put into the standard 12-volt power system.
One of the things about OCP is that it offers us the opportunity to add different nodes, different processor types, different accelerators and provided they fit that form factor, so we can splice them right into the rack and we can essentially reuse that infrastructure for a bunch of different types of computing solutions.
insideHPC: I’m curious about the nature of the workload for these systems. Are these entirely classified workloads?
Dan Dowling: It is a mix of workloads. The strategy that the NNSA has taken with their compute infrastructure is really two-pronged.The CTS basically runs everything that gets the day-to-day job done, and the ATS is that next-generation, far-reaching type of project centered around “how do we get to the next level?” For CTS, the idea is creating this common platform and common software stack that they call TOSS. They provide their users with a very well understood, very stable platforms so that they’re able to transport their codes from one lab to another.
insideHPC: It sounds like this deal could open some doors for you. Do you think we’re going to see more Open Compute in HPC and specifically Tundra?
Dan Dowling: Yes. This is the US government’s validation of OCP, so I think we’re going to see a lot more of Tundra in HPC. So we’ve got a lot of passion and a lot of interest.
It’s always bothered me looking at a rack full of 30 or 40 1U nodes, and how much waste is in there. That means there’s 40 power supplies or even 80 power supplies. You just know that that’s not the right way to build things. But until now there really hasn’t been an open standard that everybody can build to. With OCP, you get the benefits of the proprietary blade technology, but you also get it in an open form factor. So yeah, I think this is going to catch on.
insideHPC: We’ve been talking to your CTO Phil Porkorny about Open Compute for a few years now. From our side, its exciting to see this new OCP platform taking hold in HPC.
Dan Dowling: Well, thanks. Phil is passionate about Open Technologies, and at Penguin Computing, we were founded on the idea of open source and Linux. So now we’re really taking hardware to that next level of driving open standards is something we’re thrilled about.