Today Intel announced a deep learning performance record on image classification workloads. Intel was able to achieve 7878 images per second on ResNet-50 with its latest generation of Intel Xeon Scalable processors, outperforming 7844 images per second on Nvidia Tesla V100, the best GPU performance as published by NVIDIA on its website including T4.
This is a significant milestone for customers who have Intel Xeon Scalable processors widely available in their clouds and data centers. Since CPUs are designed for a broad set of applications, customers can run any DL workload important to their business at any given time. This benchmark pops the notion of GPU’s brute strength for AI. In fact, having CPUs with high deep learning capabilities, give AI customers the flexibility to manage their compute infrastructure uniformly and cost effectively.Intel has been advancing both hardware and software rapidly in the recent years to accelerate deep learning workloads. Today, we have achieved leadership performance of 7878 images per second on ResNet-50 with our latest generation of Intel Xeon Scalable processors, outperforming 7844 images per second on Nvidia Tesla V100, the best GPU performance as published by Nvidia on its website including T4.
Deep learning is used in image/video processing, natural language processing, personalized recommender systems, and reinforcement learning. The types of workloads and algorithms are rapidly expanding. A general purpose CPU is very adaptable to this dynamically changing environment.
We measured the throughput of ResNet-50 on a 2nd gen Intel Xeon Scalable processor (formerly codenamed Cascade Lake), more specifically Intel Xeon Platinum 9282 processor, a high core-count multi-chip packaged server multiprocessor, using Intel Optimized Caffe. We achieved 7878 images per second by simultaneously running 28 instances each one across four cores with batch size 11. The performance on Nvidia Tesla V100 is 7844 images per second and Nvidia Tesla T4 is 4944 images per second per Nvidia’s published numbers as of the date of this publication (May 13, 2019).
In Apr 2019, Intel announced the 2nd gen Intel Xeon Scalable processors with Intel Deep Learning Boost technology. This technology include integer vector neural network instructions (VNNI), providing the high throughput for 8-bit inference with a theoretical peak compute gain of 4x INT8 OPS over FP32 OPS.
Intel Optimized Caffe is an open-source deep learning framework maintained by Intel for the broad deep learning community. We have recently added four general optimizations for INT8 inference: 1) activation memory optimization, 2) weight sharing, 3) convolution algorithm tuning, and 4) first convolution transformation.
In the left corner: Intel using dual-socket Platinum 9282, 56-core 400W CPU, 768GB DDR 3Ghz memory: Cost: about $80K, 1000W.
In the right corner: Nvidia using a single Tesla V100 32GB GPU, 250W, $8K.
Wow what a fight! I wonder who’ll win!