AWS Announces AI Servers with NVIDIA Blackwell

Amazon Web Services announced general availability of P6e-GB200 UltraServers with NVIDIA Grace Blackwell Superchips. The servers are designed for training and deploying the large-scale AI models, and they follow the launch earlier this year of P6-B200 instances, also with NVIDIA Blackwell GPUs, for AI and high-performance computing workloads.

The new servers are AWS’s most powerful GPU offering, according to David Brown, vice president, AWS Compute & Machine Learnng Services. They feature up to 72 NVIDIA Blackwell GPUs interconnected using fifth-generation NVIDIA NVLink — all functioning as a single compute unit, Brown said.

Each UltraServer delivers 360 petaflops of FP8 compute and 13.4 TB of total high bandwidth GPU memory (HBM3e). Brown said this is more than 20 times the compute and over 11 times the memory in a single NVLink domain compared to P5en instances. P6e-GB200 UltraServers support up to 28.8 Tbps aggregate bandwidth of fourth-generation Elastic Fabric Adapter (EFAv4) networking.

Each P6-B200 instance provides 8 NVIDIA Blackwell GPUs interconnected using NVLink with 1.4 TB of high bandwidth GPU memory, up to 3.2 Tbps of EFAv4 networking, and fifth-generation Intel Xeon Scalable processors. P6-B200 instances offer up to 2.25 times the GPU TFLOPs, 1.27 times the GPU memory size, and 1.6 times the GPU memory bandwidth compared to P5en instances.

AWS said P6e-GB200 UltraServers are suited for compute- and memory-intensive AI workloads, such as training and deploying frontier models at the trillion-parameter scale. The NVIDIA GB200 NVL72 architecture achieves efficiencies by reducing communication overhead between GPU nodes. For inference workloads, the ability to fully contain trillion-parameter models within a single NVLink domain means faster, more consistent response times at scale, AWS said.

When combined with optimization techniques such as disaggregated serving with NVIDIA Dynamo, the large domain size of GB200 NVL72 architecture delivers inference efficiencies for various model architectures, such as mixture of experts models. Brown said GB200 NVL72 is particularly powerful when you need to handle extra-large context windows or run high-concurrency applications in real time.

:Imagine a system that can explore multiple approaches to complex problems, drawing on its understanding of vast amounts of data, from scientific datasets to source code to business documents, and reasoning through the possibilities in real time,” Brown stated in a blog post. “This lightning-fast reasoning isn’t waiting on the horizon. It’s happening today in our customers’ AI production environments. The scale of the AI systems that our customers are building today—across drug discovery, enterprise search, software development, and more—is truly remarkable. And there’s much more ahead.”

Brown said P6e-GB200 UltraServers have been deployed in third-generation EC2 UltraClusters, which creates a single fabric for AWS’s largest data centers. Third-generation UltraClusters cut power consumption by up to 40 percent and reduce cabling requirements by more than 80 percent, improving efficiency and reducing potential points of failure, according to Brown.

AWS uses Elastic Fabric Adapter (EFA) with its Scalable Reliable Datagram protocol, which routes traffic across multiple network paths, built to maintain smooth operation even during congestion or failures. Brown said P6e-GB200 and P6-B200 instances with EFAv4 show up to 18 percent faster collective communications in distributed training compared to P5en instances that use EFAv3.

While P6-B200 instances are air-cooled, P6e-GB200 UltraServers use liquid cooling, enabling higher compute density in large NVLink domain architectures, according to AWS. P6e-GB200 servers provide configurable liquid-to-chip cooling in both new and existing data centers, “so we can support both liquid-cooled accelerators and air-cooled network and storage infrastructure in the same facility.”

P6e-GB200 UltraServers will also be available through NVIDIA DGX Cloud. DGX Cloud is a unified AI platform with NVIDIA’s AI software stack.