AWS Launches Nvidia GPU-Driven EC2 P4d Instances for AI, HPC

Print Friendly, PDF & Email

Amazon Web Services today announced the general availability of Amazon EC2 P4d Instances powered by Nvidia GPUs with EC2 UltraClusters capability delivering 3x faster performance, up to 60 percent lower cost, and 2.5x more GPU memory for machine learning training and HPC workloads compared to previous-generation P3 instances, according to AWS.

The company said P4d instances feature eight Nvidia A100 Tensor Core GPUs and 400 Gbps of network bandwidth (16x more than P3 instances). Using P4d instances with AWS’s Elastic Fabric Adapter (EFA) and Nvidia GPUDirect RDMA (remote direct memory access) enables P4d instances with EC2 UltraClusters capability. These instances can scale to more than 4,000 A100 GPUs by using AWS-designed non-blocking petabit-scale networking infrastructure integrated with Amazon FSx for Lustre high performance storage, offering on-demand access to supercomputing-class performance to accelerate ML training and HPC, AWS said.

The 3x increased performance of P4d instances reduces training times for machine learning models from days to hours, according to AWS, and the additional GPU memory helps customers train larger, more complex models.

“As data becomes more abundant, customers are training models with millions and sometimes billions of parameters, like those used for natural language processing for document summarization and question answering, object detection and classification for autonomous vehicles, image classification for large-scale content moderation, recommendation engines for e-commerce websites, and ranking algorithms for intelligent search engines—all of which require increasing network throughput and GPU memory,” the company said.

P4d instances with 8 Nvidia A100 GPUs are capable of up to 2.5 petaflops of mixed-precision performance and 320 GB of high bandwidth GPU memory in one EC2 instance. AWS said P4d instances are the first in the industry to offer 400 Gbps network bandwidth with Elastic Fabric Adapter (EFA) and Nvidia GPUDirect RDMA network interfaces to enable direct communication between GPUs across servers for lower latency and higher scaling efficiency.

Each P4d instance also offers 96 Intel Xeon Scalable (Cascade Lake) vCPUs, 1.1 TB of system memory, and 8 TB of local NVMe storage to reduce single node training times. “By more than doubling the performance of previous generation of P3 instances, P4d instances can lower the cost to train machine learning models by up to 60 percent, providing customers greater efficiency over expensive and inflexible on-premises systems,” AWS said. “HPC customers will also benefit from P4d’s increased processing performance and GPU memory for demanding workloads like seismic analysis, drug discovery, DNA sequencing, materials science, and financial and insurance risk modeling.”

P4d instances are also built on the AWS Nitro System, AWS-designed hardware and software that has enabled AWS to offer a broader selection of EC2 instances and configurations. P4d instances offload networking functions to dedicated Nitro Cards that accelerate data transfer between multiple P4d instances. Nitro Cards also enable EFA and GPUDirect, which allows for direct cross-server communication between GPUs, facilitating lower latency and better scaling performance across EC2 UltraClusters of P4d instances, according to AWS.

“The pace at which our customers have used AWS services to build, train, and deploy machine learning applications has been extraordinary. At the same time, we have heard from those customers that they want an even lower cost way to train their massive machine learning models,” said Dave Brown, Vice President, EC2, AWS. “Now, with EC2 UltraClusters of P4d instances powered by Nvidia’s latest A100 GPUs and petabit-scale networking, we’re making supercomputing-class performance available to virtually everyone, while reducing the time to train machine learning models by 3x, and lowering the cost to train by up to 60% compared to previous generation instances.”

AWS said customers can run P4d instances with AWS Deep Learning Containers with libraries for Amazon Elastic Kubernetes Service (Amazon EKS) or Amazon Elastic Container Service (Amazon ECS). For a more fully managed experience, customers can use P4d instances via Amazon SageMaker, designed to help developers and data scientists to build, train and deploy ML models quickly.

HPC customers can leverage AWS Batch and AWS ParallelCluster with P4d instances to help orchestrate jobs and clusters. P4d instances support all ML learning frameworks, including TensorFlow, PyTorch and Apache MXNet, giving customers the flexibility to choose their preferred framework. P4d instances are available in US East (N. Virginia) and US West (Oregon), with availability planned for additional regions soon. P4d instances can be purchased as on-demand, with savings plans, with reserved instances or as spot instances.

GE Healthcare is the $16.7 billion healthcare business of GE. “At GE Healthcare, we provide clinicians with tools that help them aggregate data, apply AI and analytics to that data and uncover insights that improve patient outcomes, drive efficiency and eliminate errors,” said Karley Yoder, VP & GM, Artificial Intelligence, at GE Healthcare. “Our medical imaging devices generate massive amounts of data that need to be processed by our data scientists. With previous GPU clusters, it would take days to train complex AI models, such as Progressive GANs, for simulations and view the results. Using the new P4d instances reduced processing time from days to hours. We saw two- to three-times greater speed on training models with various image sizes, while achieving better performance with increased batch size and higher productivity with a faster model development cycle.”

Toyota Research Institute (TRI), founded in 2015, is working to develop automated driving, robotics, and other human amplification technology for Toyota. “At TRI, we’re working to build a future where everyone has the freedom to move,” said Mike Garrison, Technical Lead, Infrastructure Engineering at TRI. “The previous generation P3 instances helped us reduce our time to train machine learning models from days to hours and we are looking forward to utilizing P4d instances, as the additional GPU memory and more efficient float formats will allow our machine learning team to train with more complex models at an even faster speed.”