Supermicro’s New Line of AMD EPYC-based Systems: Addressing HPC Needs across the Spectrum

Supermicro recently launched its A+ line of systems based on AMD’s new EPYC 7nm microprocessors – products that include servers, storage, GPU-optimized, SuperBlade, and Multi-Node Twin Solutions designed, according to Vik Malyala, Supermicro’s Senior Vice President, FAE & Business Development, to exactly match system requirements for challenging enterprise workloads. In this interview, Malyala discusses the company’s track record of incorporating “the latest and greatest” into its products ASAP.

The new AMD-based offerings incorporate single-socket and dual-socket system solutions designed to reduce time-to-results and drive better business decisions. For example, the recently announced 2U 2-node multi-GPU server is a strong platform for video streaming, high-end cloud gaming and social networking applications. Malyala stressed that Supermicro prides itself on system flexibility that delivers cost savings and uninterrupted performance to customers. With innovative server designs that reduce power consumption while delivering top performance, Supermicro’s application-optimized servers enable modern enterprises to reduce costs and enhance the user experience.

Here’s a transcript of Malyala’s conversation with insideHPC editor-in-chief Doug Black.

Doug Black: Hi, I’m Doug Black at inside HPC. And today we’re talking with Vic Malyala, Senior Vice President at server maker Supermicro, which recently launched its server lineup based on AMD’s new 7003 series EPYC processors. Vic, welcome.

Vik Malyala: Thank you, Doug. Thanks for having me.

Black: So my interest naturally is in Supermicro’s offering for supercomputing and enterprise HPC, can you give us a top line assessment of your EPYC hardware portfolio for those markets?

Malyala: Absolutely. As you’re well aware, and so is our community, Supermicro made its name by bringing the latest and greatest, especially related to high performance computing, first to market. I think that’s always been our marker for success. And we continue to extend that leadership and that DNA.

And what we have done, when you take a look at EPYC, is that it’s the number of cores and with the frequency and all the bells and whistles that come with it, by bringing that in our innovative Supermicro architecture, that we are able to actually make a difference. The good part here is that we have realized over a period of time that this whole HPC world has evolved quite a bit, not just at the very big national labs but also in research institutes and commercial and enterprise customers — every area that you can think of in HPC. The types of deployments of products vary quite a bit from one to another. And because of our way of bringing (products to market) we are able to address that.

So here’s the point — it’s that if you take a look at our dense computing, typically the Twin architecture and the blade, so we have BigTwin® and we have TwinPro®, and we have SuperBlade® This is the center of this market and we are able to bring that (to market). At the same time, we also have seen increased adoption of accelerators in HPC. And because of that we have to have different types of GPUs and the vector accelerators supported. So we have our Delta and Redstone, which is a 4 or 8 NVIDIA HGX A100 GPUs and with a very high-speed fabric between them. That is one type of product. Another one is a 4U, an eight-GPU platform, which is a standard PCIe type of form factor.

In addition to that, we also have products that are standard 1U and 2U “pizza boxes,” these are the ones that typically use data for very big memory footprint or are used for storage that is needed for HPC.

So the gamut of these products is quite vast, anywhere from dense computing going all the way to maximizing the I/O and memory footprint.

Black: Okay, great. Getting back to the new chip itself, what is Supermicro’s assessment of the new EPYC CPUs? How much of an advanced are they over existing x86 chips? And please put that in the context of how that impacts your server lineup.

Malyala: It’s a very interesting point. One of the things about the previous generation of EPYC — AMD has brought in up to 64 cores and 128 lanes PCI Express Gen4 and 3200 megahertz memory and all that. So one might think that’s the big deal. The big deal here is that they were able to further squeeze more performance within that. For example, if you take a single processor within the same thermal profile, they were able to increase the core frequency, or increase the amount of cache that goes on the processor. And, as well increase the core frequency, because one of the important things is how do we improve the performance per core versus per socket. Especially when you’re going in on dual socket, they also include the interconnect between the CPU so the bandwidth and the speed kind of comes into the picture. When we turn all these things into what we can do in our platforms, what we have done is tune these platforms to make sure that we bring the best from what AMD has to offer in the 3rd Generation EPYC and be able to provide different form factors and get the GPU support, to connect the fastest interconnect. For example, the HDR200G networking, multiples of them, and be able to support PCIe Gen 4 NVMe U.2 or M.2 (please correct product names these are not product names, but technology standards/interface).

So if we take a look at all the different technologies that are available today, how we are able to bring it together and how we are able to connect it with EPYC to our platforms, I think that’s the way we are able to bring the technology.

And if we look at the workloads, there are different workloads that we can go after that are focused on high frequency, like an EDA (Electronic Design Automation) type of workloads and automotive workloads, versus the ones that are massively parallel. And then we take a look at something like a WRF (Weather Research and Forecasting Model) type of thing where you’re talking about more numbers of cores and memory bandwidth. Again, we are able to bring (them) into Supermicro’s platforms to optimize, so that the way (customers) can take it depends on what the workload is, and they can run with it.

Black: Yes, that density that the seven nanometer brings, very impressive. So in your announcement of the EPYC-based servers you discuss record setting performance, can you share some details in that area as to how much faster your servers are…

Malyala: There are a couple of ways to look at it. One is making sure what the processor has to offer. We are able to bring all the features and functionality without any compromise. Because one may have the top-end processor, and if you not able to cool it then it’s a problem. For example, we were able to have these processors supported both with forced air cooling and direct liquid cooling, including immersion cooling, depending on what the customers want and how they want to use the available power for the computer for the overall cooling shared between the computer and the cooling. So that’s one way to look at it.

And from our point of view, we looked at what are the different workloads that people could actually benefit from. Because you have seen, especially through the pandemic, people don’t have access to the systems to receive them and put them in the lab and test it out. So we kind of took it upon ourselves to run different benchmarks, even at an application level, and see what we can get and whether it’s something that we can provide the customers for easier adoption.

For example, if you look at MLPerf, which is a typical indicator of the artificial intelligence and machine learning benchmarks, we actually have run these things with our 3rd Gen EPYC-based with both Delta and Redstone (these are GPU Systems that features NVIDIA HGX A100 8-GPU and 4-GPU, respectively) along with Dual Socket 3rd Gen AMD EPYC processors, the official name is AS -4124GO-NART (Delta) and AS -2124GQ-NART (Redstone), and we have published that data. So one can see that it’s not just a product that we are putting out, you can see a generational improvement, especially in the processor side, an 8 to 10 percent on a very high BIN (basically talking about EPYC product BINs with high wattage, frequency and core counts) versus in the middle end you are actually able to see 20 to 30 percent improvement from the prior generations.

Another thing we have seen is storage being a very important factor. We actually work with WekaIO, as an example, for parallel file system. And in just six nodes with a single socket EPYC processor in our WIO platform, we were able to do about 217 gigabytes per second peak performance with roughly around 7 million IOPS, which is pretty damn good. And that’s the reason I think people can just take it as a turnkey solution or take it as a reference design and run with it.

Another one we have seen is …, one of the Java performance benchmarks that people are looking at, whether it’s a critical ops or the max ops, we are able to bring the right value at the node level and also when you’re scaling across multiple nodes. So we ran it, for example, across our SuperBlade, which, compared to the previous generation, the top-end, versus now, when we plug it in, we were able to see a 36 percent improvement in performance. This is phenomenal. And we are able to do that because we are able to get the right platform associated with it.

Other ones we also have seen is – while the actual benchmark final results is not out – we have done weather research simulation and forecasting in the OpenFOAM, which is mostly involved in very complex fluid movements and explosions and chemical reactions, all these different types of things. Bottom line: it’s going to be a huge effort for us to complete these things. But we have put effort in it and we were able to see performance improvements anywhere from 8 to 9 percent all the way up to 40 percent, depending on the workload and depending on the platform and processor that you choose.

Black: Okay, pretty impressive speed ups no doubt. I was also impressed that Supermicro had its EPYC servers available — not just announcement available but the day AMD announced the new EPYCs. Isn’t that kind of unusual? Tell us how that came to happen?

Malyala: Unusual for most but to be honest, Supermicro takes pride in bringing new technology first to market. We closely work with our technology partners and that actually enables us to bring it to the market first.

Another way to look at it is Supermicro built the entire product portfolio based on our Supermicro Building Block approach. So different bits of the puzzle are already there, and we were able to bring it together and evaluate.

One other thing here is that it’s socket-compatible. So several of these platforms that we have been already selling with our Rome (2nd Gen AMD EPYC) server, we already made it ready to go with Milan (3rd Gen AMD EPYC) by using their BIOS update. And we were able to plug in Milan to get it going.

For over a period of time – in the last one, one-one-and-a-half-years – we also have looked at the gaps in our product portfolio and added, for example, our FatTwin® product line, our two node and our WIO platform, as well as the CloudDC platform. All these platforms are added to expand the portfolio while at the same time filling in the gaps. Because the time that we had, and because of the products that we already have developed and the building block approach and everything developed in-house, gave us this unique ability to launch these products on day one.

Black: So with socket compatible building blocks you’re halfway there when the new chips come out – I see. How does Supermicro distinguish itself from other leading server makers competing in the HPC and supercomputing markets?

Malyala: If you compare the overall strategy of Supermicro, our methodologies, building blocks and …, if you take a look at many HPC providers, they go top down. They have few SKUs and then push them across different workloads. Ours is exactly the opposite. If you take a look at the (Supermicro) product line – whether it’s a single socket or a dual socket, one DIMM per channel design or two DIMMs per channel design, whether you want a standard 1U-2U pizza box or you want a multi-node going all the way to 20 nodes in a two-form factor with 200G InfiniBand connection, and NVMe as a part of it, and different accelerators, whether you want to plug in one GPU, two, four or eight GPUs, whatever the kind of fabric connected — all these things are validated and ready to roll.

Again, no two customers are the same, there is no one size fits all, and especially the way HPC is evolving, people have different needs. And this gives us this unique ability to differentiate from the competition and bring the right value to the customers. I think that’s where we shine. And that’s where we bring the value.

Black: So meet the customer where their HPC needs are. Well great Vick, great to be with you and great to be with Supermicro today. Thanks for joining us.

Malyala: I appreciate it. Thanks for the opportunity.