Podcast: Accelerating AI Inference with Intel Deep Learning Boost

Print Friendly, PDF & Email

JasonKennedy is Director of Datacenter Revenue Products and Marketing at Intel.

In this Chip Chat podcast, Jason Kennedy from Intel describes how Intel Deep Learning Boost works as an embedded AI accelerator in the CPU designed to speed deep learning inference workloads.

The key to Intel DL Boost – and its performance kick – is augmentation of the existing Intel Advanced Vector Extensions 512 (Intel AVX-512) instruction set. This innovation significantly accelerates inference performance for deep learning workloads optimized to use vector neural network instructions (VNNI). Image classification, language translation, object detection, and speech recognition are just a few examples of workloads that can benefit. Early tests have shown image recognition 11 times faster using a similar configuration than with current-generation Intel Xeon Scalable processors when launched in July 2017. Current projections estimate 17 times faster inference throughput benefit with Intel Optimized Caffe ResNet-50 and Intel Deep Learning Boost that can be achieved with a new class of advance performance CPUs debuting in the upcoming generation.

For more background, this presentation by Banu Nagasundaram from Intel offers an overview of Intel’s Deep Learning Boost technology, featuring integer vector neural network instructions targeting future Intel Xeon scalable processors. These instructions improve throughput of multiply-add operations with int8 and int16 data types and are used to achieve performance gains in low-precision convolution and matrix-matrix multiplication operations used in deep neural networks. Banu walks you through the 8-bit integer convolution implementation made in the Intel MKLDNN library to demonstrate how this new instruction is used in optimized code.

Download the MP3

Download the White Paper: Lower Numerical Precision Inference and Deep Learning