At SC20: AMD Says New GPU Surpasses 10TFLOPS for HPC Acceleration

Print Friendly, PDF & Email

AMD today announced the new Instinct MI100 accelerator, which the company said is the first x86 server GPU to surpass 10 teraflops (FP64) performance.

Built on the new AMD CDNA architecture, the Instinct MI100 “enables a new class of accelerated systems for HPC and AI when paired with 2nd Gen AMD Epyc processors,” the company said. The MI100 offers up to 11.5 TFLOPS of peak FP64 performance for HPC and up to 46.1 TFLOPS peak FP32 Matrix performance for AI and machine learning workloads, according to AMD. With AMD Matrix Core technology, the MI100 also delivers a nearly 7x boost in FP16 theoretical peak floating point performance for AI training workloads compared to AMD’s prior generation accelerators, AMD said.

Supported by year’s end by compute platforms from Dell, Gigabyte, HPE, and Supermicro, the MI100, combined with AMD Epyc CPUs and the ROCm 4.0 open software platform, “is designed to propel new discoveries ahead of the exascale era,” AMD said.

AMD said the ROCm developer software “provides the foundation for exascale computing. As an open source toolset consisting of compilers, programming APIs and libraries, ROCm is used by exascale software developers to create high performance applications. ROCm 4.0 has been optimized to deliver performance at scale for MI100-based systems. ROCm 4.0 has upgraded the compiler to be open source and unified to support both OpenMP® 5.0 and HIP. PyTorch and Tensorflow frameworks, which have been optimized with ROCm 4.0, can now achieve higher performance with MI100. ROCm 4.0 is the latest offering for HPC, ML and AI application developers which allows them to create performance portable software.”

“We’ve received early access to the MI100 accelerator, and the preliminary results are very encouraging,” said Bronson Messer, director of science, Oak Ridge Leadership Computing Facility. “We’ve typically seen significant performance boosts, up to 2-3x compared to other GPUs. What’s also important to recognize is the impact software has on performance. The fact that the ROCm open software platform and HIP developer tool are open source and work on a variety of platforms, it is something that we have been absolutely almost obsessed with since we fielded the very first hybrid CPU/GPU system.”

Features of the AMD Instinct MI100 accelerator include:

  • Delivers 11.5 TFLOPS peak FP64 performance and 23.1 TFLOPS peak FP32 performance.
  • Matrix Core technology for HPC and AI delivering single and mixed precision matrix operations, such as FP32, FP16, bFloat16, Int8 and Int4, for converged HPC and AI.
  • 2nd Gen AMD Infinity Fabric Technology – Instinct MI100 provides ~2x the peer-to-peer (P2P) peak I/O bandwidth over PCIe 4.0 with up to 340 GB/s of aggregate bandwidth per card with three AMD Infinity Fabric Links. MI100 GPUs can be configured in a server with up to two fully-connected quad GPU hives, each providing up to 552 GB/s of P2P I/O bandwidth for fast data sharing.
  • 32GB high-bandwidth HBM2 memory at a clock rate of 1.2 GHz delivering 1.23 TB/s of memory bandwidth to support large data sets and help eliminate bottlenecks in moving data in and out of memory.
  • Support for PCIe Gen 4.0 providing up to 64GB/s peak theoretical transport data bandwidth from CPU to GPU.

“Today AMD takes a major step forward in the journey toward exascale computing as we unveil the AMD Instinct MI100 – the world’s fastest HPC GPU,” said Brad McCredie, corporate vice president, Data Center GPU and Accelerated Processing, AMD. “Squarely targeted toward the workloads that matter in scientific computing, our latest accelerator, when combined with the AMD ROCm open software platform, is designed to provide scientists and researchers a superior foundation for their work in HPC.”