Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


New NVIDIA DGX A100 Packs Record 5 Petaflops of AI Performance for Training, Inference, and Data Analytics

NVIDIA DGX A100 exploded view

Today NVIDIA unveiled the NVIDIA DGX A100 AI system, delivering 5 petaflops of AI performance and consolidating the power and capabilities of an entire data center into a single flexible platform.

Immediately available, DGX A100 systems have begun shipping worldwide, with the first order going to the U.S. Department of Energy’s (DOE) Argonne National Laboratory, which will use the cluster’s AI and computing power to better understand and fight COVID-19.

NVIDIA DGX A100 is the ultimate instrument for advancing AI,” said Jensen Huang, founder and CEO of NVIDIA. “NVIDIA DGX is the first AI system built for the end-to-end machine learning workflow – from data analytics to training to inference. And with the giant performance leap of the new DGX, machine learning engineers can stay ahead of the exponentially growing size of AI models and data.”

DGX A100 systems integrate eight of the new NVIDIA A100 Tensor Core GPUs, providing 320GB of memory for training the largest AI datasets, and the latest high-speed NVIDIA Mellanox® HDR 200Gbps interconnects.

Multiple smaller workloads can be accelerated by partitioning the DGX A100 into as many as 56 instances per system, using the A100 multi-instance GPU feature. Combining these capabilities enables enterprises to optimize computing power and resources on demand to accelerate diverse workloads, including data analytics, training and inference, on a single, fully integrated, software-defined platform.

Immediate DGX A100 Adoption, Support

A number of the world’s largest companies, service providers and government agencies have placed initial orders for the DGX A100, with the first systems delivered to Argonne earlier this month.

We’re using America’s most powerful supercomputers in the fight against COVID-19, running AI models and simulations on the latest technology available, like the NVIDIA DGX A100,” said Rick Stevens, associate laboratory director for Computing, Environment and Life Sciences at Argonne. “The compute power of the new DGX A100 systems coming to Argonne will help researchers explore treatments and vaccines and study the spread of the virus, enabling scientists to do years’ worth of AI-accelerated work in months or days.”

The University of Florida will be the first institution of higher learning in the U.S. to receive DGX A100 systems, which it will deploy to infuse AI across its entire curriculum to foster an AI- enabled workforce.

The University of Florida has a vision to be a national leader in artificial intelligence, and NVIDIA is an incredibly valuable partner in our quest to do so,” said University of Florida President Kent Fuchs. “Across disciplines, our new NVIDIA DGX A100 systems will position our researchers to solve some of our world’s most pressing challenges and equip an entire generation of students with the skills that will revolutionize the future workforce.”

Among other early adopters are:

  • The Center for Biomedical AI — at the University Medical Center Hamburg-Eppendorf, Germany — will leverage DGX A100 to advance clinical decision support and process optimization.
  • Chulalongkorn University — Thailand’s top research-intensive university — will use DGX A100 to accelerate its pioneering research such as Thai natural language processing, automatic speech recognition, computer vision and medical imaging.
  • Element AI — a Montreal-based developer of AI-powered solutions and services — is deploying DGX A100 to accelerate performance and feature optimization for its Orkestrator GPU scheduler to meet growing AI training and application demands.
  • German Research Center for Artificial Intelligence (DFKI) will use the DGX A100 systems to further accelerate its research on new deep learning methods and their explainability while significantly reducing space and energy consumption.
  • Harrison.ai — a Sydney-based healthcare AI company — will deploy Australia’s first DGX A100 systems to accelerate the development of its AI-as-medical-device.
  • The UAE Artificial Intelligence Office – first in the Middle East to deploy the new DGX A100 – is building a national infrastructure to accelerate AI research, development and adoption across the public and private sector.
  • VinAI Research — Vietnam’s leading AI research lab, based in Hanoi and Ho Chi Minh City — will use DGX A100 to conduct high-impact research and accelerate the application of AI.

Thousands of previous-generation DGX systems are in use around the globe by a wide range of public and private organizations. Among them are some of the world’s leading businesses, including automakers, healthcare providers, retailers, financial institutions and logistics companies that are pushing AI forward across their industries.

NVIDIA DGX Superpod

NVIDIA Builds Next-Gen 700 Petaflops DGX SuperPOD

NVIDIA also revealed its next-generation DGX SuperPOD, a cluster of 140 DGX A100 systems capable of achieving 700 petaflops of AI computing power. Combining 140 DGX A100 systems with NVIDIA Mellanox HDR 200Gbps InfiniBand interconnects, NVIDIA built its own next- generation DGX SuperPOD AI supercomputer for internal research in areas such as conversational AI, genomics and autonomous driving.

The cluster is one of the world’s fastest AI supercomputers — achieving a level of performance that previously required thousands of servers. The enterprise-ready architecture and performance of DGX A100 enabled NVIDIA to build the system in just one month, instead of taking months or years of planning and procurement of specialized components previously required to deliver these supercomputing capabilities.

To help customers build their own A100-powered data centers, NVIDIA has released a new DGX SuperPOD reference architecture. It gives customers a blueprint that follows the same design principles and best practices NVIDIA used to build its DGX A100-based AI supercomputing cluster.

DGXpert Program, DGX-Ready Software

NVIDIA also launched the NVIDIA DGXpert program, which brings together DGX customers with the company’s AI experts; and the NVIDIA DGX-Ready Software program, which helps customers take advantage of certified, enterprise-grade software for AI workflows. DGXperts are AI-fluent specialists who can help guide clients on AI deployments, from planning to implementation to ongoing optimization. These individuals can help DGX A100 customers build and maintain state-of-the-art AI infrastructure.

The NVIDIA DGX-Ready Software program helps DGX customers quickly identify and take advantage of NVIDIA-tested third-party MLOps software that can help them increase data science productivity, accelerate AI workflows and improve accessibility and utilization of AI infrastructure. The first program partners certified by NVIDIA are Allegro AI, cnvrg.io, Core Scientific, Domino Data Lab, Iguazio and Paperspace.

DGX A100 Technical Specifications:

    • Eight NVIDIA A100 Tensor Core GPUs, delivering five petaflops of AI power, with 320GB in total GPU memory with 12.4TB per second in bandwidth.
    • Six NVIDIA NVSwitchTM interconnect fabrics with third-generation NVIDIA NVLink® technology for 4.8TB per second of bi-directional bandwidth.
    • Nine NVIDIA Mellanox ConnectX-6 HDR 200Gb per second network interfaces, offering a total of 3.6Tb per second of bi-directional bandwidth.
    • Leverages NVIDIA Mellanox’s In-Network Computing and network acceleration engines such as RDMA, GPUDirect® and Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)TM to enable the highest performance and scalability.
    • Powered by the NVIDIA DGX software stack, which includes optimized software for AI and data science workloads, delivering maximized performance, enabling enterprises to achieve a faster return on their investment in AI infrastructure.
    • A single rack of five DGX A100 systems replaces a data center of AI training and inference infrastructure, with 1/20th the power, 1/25th the space and 1/10th the cost.

Availability

NVIDIA DGX A100 systems start at $199,000 and are shipping now through NVIDIA Partner Network resellers worldwide. Storage technology providers DDN Storage, Dell Technologies, IBM, NetApp, Pure Storage and Vast plan to integrate DGX A100 into their offerings, including those based on the NVIDIA DGX POD and DGX SuperPOD reference architectures.

NVIDIA DGX-Ready Data Center partners offer colocation services in more than 122 locations across 26 countries to help customers seeking cost-effective facilities to host their DGX infrastructure. DGX A100 customers will be able to take advantage of these services to house and access DGX A100 infrastructure inside validated, world-class data center facilities.

In this video, NVIDIA CEO Jensen Huang summarizes his GTC 2020 Kitchen Keynote.

Sign up for our insideHPC Newsletter

Resource Links: