Microway Deploys NVIDIA DGX POD-based AI Supercomputer at MSOE

Print Friendly, PDF & Email

Microway recently deployed an NVIDIA DGX POD-based supercomputer for education and applied research at the Milwaukee School of Engineering (MSOE). Called “Rosie,” the supercomputer forms the centerpiece of the university’s new computer science program and will support an expansion of deep learning and AI education designed to permeate across the institution.

DGX POD is a reference architecture that provides a blueprint for designing large-scale data center infrastructure that can support modern artificial intelligence (AI) development. It is based on the NVIDIA DGX SATURNV AI supercomputer, which powers internal NVIDIA AI research and development used in autonomous vehicles, robotics, graphics, HPC, and other domains.

We are extremely pleased by the opportunity to work with NVIDIA and MSOE on this significant new education and applied research facility,” said Eliot Eshelman, VP of Strategic Accounts and HPC Initiatives at Microway. “Microway’s expertise, combined with NVIDIA’s DGX POD architecture, enabled us to deliver a new type of cluster that melds the best of HPC with the latest developments in deep learning. In addition to enabling new research, this cluster simplifies student usage for studies of data analytics, AI, and computer science.”

As an experienced cluster integrator and NVIDIA Partner Network Elite DGX partner, Microway’s role was essential to delivering a complete solution that was operational on day one. Microway experts performed a careful system, storage, and network architecture design and design review with MSOE IT personnel and NVIDIA solutions architects to meet MSOE’s specific AI education and computer science needs.

The cluster design includes three racks of DGX servers, high-speed storage, 100G networking, and management servers, along with NVIDIA NGC deep learning containers and the NVIDIA DGX Software stack, deployed and managed with NVIDIA DeepOps. It features NVIDIA DGX-1 AI systems with NVIDIA V100 Tensor Core GPU accelerators; twenty Microway NumberSmasher Xeon + NVIDIA T4 GPU teaching compute nodes; and access to NGC, which provides an online registry of software stacks optimized for deep learning, machine learning and HPC applications, as well as pre-trained models and model training scripts. Also included in the deployment are high-performance storage arrays and a larger general-purpose storage pool from storage partner NetApp.

Microway’s design and integration experts worked closely with the MSOE team to ensure the custom DGX POD-based configuration met user needs. Microway delivered and installed the cluster fully integrated and ready-to-run after many weeks of intensive integration and stress testing were performed at Microway’s facility. Thorough testing ensured not only system functionality/stability, but also performance, with analysis of GPU throughput, local NVMe cache throughput, and network storage throughput. The teams worked together to customize storage, networking, and cluster software.

Revolutionary Deployment for Classroom Computer Science and AI Instruction

Unlike many university programs in which students’ access to supercomputers is usually limited to graduate students in computer labs, this configuration gives undergraduate students at MSOE supercomputer access in the classroom, enabling training of the next AI workforce. Traditional supercomputers require that users be familiar with command line interfaces and workload managers. The DeepOps install Microway has provided to MSOE allows a student to access the “ROSIE” cluster in their web browser and start a DGX-1 or NVIDIA T4 GPU deep learning session with the click of a button.

Sign up for our insideHPC Newsletter