Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


World’s First 7nm GPU and Fastest Double Precision PCIe Card

This guest post from AMD showcases the features of its new Radeon Instinct compute products, including the AMD Radeon Instinct M160 and Radeon Instinct MI50 accelerators. 

Radeon Instinct

AMD recently announced two new Radeon Instinct™ compute products including the AMD Radeon Instinct™ MI60 and Radeon Instinct™ MI50 accelerators, which are the first GPUs in the world based on the advanced 7nm FinFET process technology. The ability to go down to 7nm allows us to put more transistors on to an even smaller package than was possible before. In this case, the MI60 contains 13.2 billion transistors on a package size of 331.46mm2, while the previous generation Radeon Instinct™ MI25 had 12.5 billion transistors on a package size of 494.8mm2. This marks a 58 percent improvement in number of transistors per mm2. This allows us to provide a more powerful and robust product, capable of tackling a wide range of workloads from training and inference, to high performance computing.

Supercharged Deep Learning Operations – Ideal for Training and Inference

We’ve made numerous improvements on these new products, including optimized deep learning operations. In addition to native half-precision (FP16) performance, the MI60 and MI50 now support INT8 and INT4 operations, delivering up to a whopping 118 TFLOPS of INT4 peak performance on the MI60. The supercharged compute capabilities of these new products are designed to meet today’s demanding system requirements of handling large data efficiently for training complex neural networks, as well as running inference against those neural networks used in deep learning.

We’ve made numerous improvements on these new products, including optimized deep learning operations.

World’s Fastest Double Precision PCIe Based Accelerator

On the other end of the compute spectrum are FP64 calculations primarily used in high performance compute workloads. These types of workloads require extreme accuracy and speed, which the MI60 and MI50 deliver. The Radeon Instinct MI60 is the fastest double precision PCIe® based accelerator1, delivering up to 7.4 TFLOPS of FP64 peak performance, while the MI50 is not far behind at 6.7 TFLOPS. In addition to fast FP64 performance, the MI60 and MI50 both sport full-chip ECC memory as well as RAS. This allows scientists and researchers across several industries including life sciences, energy, automotive and aerospace, government and more to achieve results with both speed and accuracy.

radeon instinct

Finely Balanced, Ultra-Scalable Datacenter Solution

Most of the improvements we’ve talked about so far have been at the chip level, but we didn’t stop there. We also have a number of new benefits found beyond the chip. We meticulously designed the MI60 and MI50 to deliver finely tuned and balanced performance. We took a look at some of the common bottlenecks found in previous generations and made improvements to ensure your data is processed in the most efficient manner possible. This includes making these cards PCIe® Gen 4* capable, delivering up to 2x more bandwidth (64 GB/s vs. 32 GB/s) than PCIe® Gen 3 when communicating over the bus. In addition to improved performance between GPU and CPU, we’ve also built-in a peer-to-peer GPU communication feature called Infinity Fabric™ Link technology. Each card includes two physical Infinity Fabric™ Links, allowing you to directly connect four GPUs together in a GPU hive ring and up to two of these hives in an 8 GPU server. Each GPU card provides up to 200 GB/s bandwidth between peer GPUs, which is up to 6x faster than PCIe Gen 3 alone2. We also doubled memory bandwidth speeds from our previous generation Radeon Instinct MI25 accelerator, delivering up to 1TB/s memory bandwidth on both the MI50 and MI60 accelerators – the first GPUs to achieve this speed.

With improved performance from both within the GPU and between GPUs and CPUs, these new finely-balanced, ultra-fast and scalable solutions are the ideal datacenter compute solution for all your needs whether they’re inference, training or HPC related.

Learn More About the AMD Radeon Instinct MI60.

Learn More About the AMD Radeon Instinct MI50.

Learn More About ROCm.

Links to third-party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied. GD-5

Footnotes:

1. Calculated on Oct 22, 2018, the Radeon Instinct MI60 GPU resulted in 7.4 TFLOPS peak theoretical double precision floating-point (FP64) performance. AMD TFLOPS calculations conducted with the following equation: FLOPS calculations are performed by taking the engine clock from the highest DPM state and multiplying it by xx CUs per GPU. Then, multiplying that number by xx stream processors, which exist in each CU. Then, that number is multiplied by 1/2 FLOPS per clock for FP64.

TFLOP calculations for MI60 can be found here.

External results on the NVidia Tesla V100 (16GB card) GPU accelerator resulted in 7 TFLOPS peak double precision (FP64) floating-point performance.

Results can be found here.

AMD has not independently tested or verified external/third party results/data and bears no responsibility for any errors or omissions therein.

2. As of Oct 22, 2018. Radeon Instinct™ MI50 and MI60 “Vega 7nm” technology based accelerators are PCIe® Gen 4.0* capable providing up to 64 GB/s peak theoretical transport data bandwidth from CPU to GPU per card with PCIe Gen 4.0 x16 certified servers. Previous Gen Radeon Instinct compute GPU cards are based on PCIe Gen 3.0 providing up to 32 GB/s peak theoretical transport rate bandwidth performance. Peak theoretical transport rate performance is calculated by Baud Rate * width in bytes * # directions = GB/s per card

PCIe Gen3: 8 * 2 * 2 = 32 GB/s

PCIe Gen4: 16 * 2 * 2 = 64 GB/s

Radeon Instinct™ MI50 and MI60 “Vega 7nm” technology based accelerators include dual Infinity Fabric™ Links providing up to 200 GB/s peak theoretical GPU to GPU or Peer-to-Peer (P2P) transport rate bandwidth performance per GPU card. Combined with PCIe Gen 4 compatibility providing an aggregate GPU card I/O peak bandwidth of up to 264 GB/s. Performance guidelines are estimated only and may vary. Previous Gen Radeon Instinct compute GPU cards provide up to 32 GB/s peak PCIe Gen 3.0 bandwidth performance. Infinity Fabric™ Link technology peak theoretical transport rate performance is calculated by Baud Rate * width in bytes * # directions * # links = GB/s per card

Infinity Fabric Link: 25 * 2 * 2 = 100 GB/s

MI50 |MI60 each have two links:

100 GB/s * 2 links per GPU = 200 GB/s

Leave a Comment

*

Resource Links: