Amazon Web Services and NVIDIA will collaborate on a scalable AI infrastructure for training large language models (LLMs) and developing generative AI applications.

The project involves Amazon EC2 P5 instances powered by NVIDIA H100 Tensor Core GPUs and AWS networking and scalability that the companies say will deliver up to 20 exaFLOPS of compute performance. They said P5 instances will be the first GPU-based instance to be combined with AWS’s second-generation Elastic Fabric Adapter networking, which provides 3,200 Gbps of low-latency, high bandwidth networking throughput, enabling customers to scale up to 20,000 H100 GPUs in EC2 UltraClusters.

P5 instances feature eight NVIDIA H100 GPUs capable of 16 petaFLOPs of mixed-precision performance, 640 GB of high-bandwidth memory, and 3,200 Gbps networking connectivity (8x more than the previous generation) in a single EC2 instance. P5 instances accelerates the time-to-train machine learning (ML) models by up to 6x, and the additional GPU memory helps customers train larger, more complex models, the companies said, adding that P5 instances are expected to lower the cost to train ML models by up to 40 percent over the previous generation, according to NVIDIA.

More information can be found here.