AWS Announces Hpc7g EC2 Instance Powered by New Arm-based Graviton3E Chip

At AWS’s re:Invent conference yesterday, Amazon Web Servicesannounced three EC2 instances powered by new Arm-based chips designed by AWS.

AWS said its Hpc7g instances on EC2, powered by new AWS Graviton3E chips, offer up to 2x better floating-point performance compared to current generation C6gn instances and up to 20 percent higher performance compared to current generation Hpc6a instances. The instances support CFD, weather simulations, genomics, molecular dynamics and other HPC workloads for clusters up to tens of thousands of cores. The instances have high-memory bandwidth and 200 Gbps of Elastic Fabric Adapter (EFA) network bandwidth, according to AWS, and can be used with AWS ParallelCluster, an open-source cluster management tool, to provision Hpc7g instances alongside other instance types, allowing customers to run different workload types within the same cluster.

Inf2 instances, powered by new AWS Inferentia2 chips, support large deep learning models (e.g., LLMs, image generation, and automated speech detection) with up to 175 billion parameters, AWS said. Inf2 is the first EC2 instance that supports distributed inference, a technique that spreads large models across several chips. Inf2 instances support stochastic rounding, a way of rounding probabilistically that enables performance and accuracy. AWS said Inf2 instances support a range of data types, including CFP8, which improves throughput and reduces power per inference, and FP32, which boosts performance of modules that have not yet taken advantage of lower precision data types.

AWS said Inf2 instances offer up to 4x the throughput and up to 10x lower latency compared to current-generation Inf1 instances, and they also offer up to 45 percent better performance per watt compared to GPU-based instances. Inf2 instances are available today in preview.

The company said C7gn instances, with new AWS Nitro Cards, offer up to 2x the network bandwidth and up to 50 percent higher packet-processing-per-second performance compared to current generation instances.

AWS said since the introduction of the AWS Nitro System in 2013, the company has developed multiple AWS-designed silicon, including five generations of the Nitro System, three generations of Graviton chips, two generations of Inferentia chips for ML inference, and Trainium chips for ML training. AWS uses cloud-based electronic design automation as part of an agile development cycle for the design and verification of AWS silicon.

“Each generation of AWS-designed silicon—from Graviton to Trainium and Inferentia chips to Nitro Cards—offers increasing levels of performance, lower cost, and power efficiency for a diverse range of customer workloads,” said David Brown, vice president of Amazon EC2 at AWS. “That consistent delivery, combined with our customers’ abilities to achieve superior price performance using AWS silicon, drives our continued innovation. The Amazon EC2 instances we’re introducing today offer significant improvements for HPC, network-intensive, and ML inference workloads, giving customers even more instances to choose from to meet their specific needs.”