Nvidia Brings AI to the Cloud with the HGX-1 Hyperscale GPU Accelerator

Print Friendly, PDF & Email

HGX-1 is based on the OCP Platform

Today, Microsoft, NVIDIA, and Ingrasys announced a new industry standard design to accelerate Artificial Intelligence in the next generation cloud.

The Project Olympus hyperscale GPU accelerator chassis for AI, also referred to as HGX-1, is designed to support eight of the latest “Pascal” generation NVIDIA GPUs and NVIDIA’s NVLink high speed multi-GPU interconnect technology, and provides high bandwidth interconnectivity for up to 32 GPUs by connecting four HGX-1 together. The HGX-1 AI accelerator provides extreme performance scalability to meet the demanding requirements of fast growing machine learning workloads, and its unique design allows it to be easily adopted into existing datacenters around the world.

Our work with NVIDIA and Ingrasys is just a one of numerous stand-out examples of how the open source strategy of Project Olympus has been embraced by the OCP community. We are pleased by the broad support across industry partners that are now part of the Project Olympus ecosystem. This is a significant moment as we usher in a new era of open source hardware development with the OCP community. We intend for Project Olympus to provide a blueprint for future hardware development and collaboration at cloud speed. You can learn more and view the specification for Microsoft’s Project Olympus at our OCP GitHub branch.

Last November, Microsoft introduced Project Olympus as a next generation cloud hardware design with a new model for open source hardware development. Today, I’m excited to address the 2017 Open Compute Project (OCP) U.S. Summit to share how this first-of-its-kind open hardware development model has created a vibrant industry ecosystem for datacenter deployments across the globe in both cloud and enterprise.

Powered by eight NVIDIA Tesla P100 GPUs in each chassis, HGX-1 features an innovative switching design based on NVIDIA NVLink interconnect technology and the PCIe standard, enabling a CPU to dynamically connect to any number of GPUs. This allows cloud service providers that standardize on the HGX-1 infrastructure to offer customers a range of CPU and GPU machine instance configurations.

Cloud workloads are more diverse and complex than ever. AI training, inferencing and HPC workloads run optimally on different system configurations, with a CPU attached to a varying number of GPUs. The highly modular design of the HGX-1 allows for optimal performance no matter the workload. It provides up to 100x faster deep learning performance compared with legacy CPU-based servers, and is estimated at one-fifth the cost for conducting AI training and one-tenth the cost for AI inferencing.

Sign up for our insideHPC Newsletter