Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


NVIDIA TensorRT 6 Breaks 10 millisecond barrier for BERT-Large

Today, NVIDIA released TensorRT 6, which includes new capabilities that dramatically accelerate conversational AI applications, speech recognition, 3D image segmentation for medical applications, as well as image-based applications in industrial automation. TensorRT is a high-performance deep learning inference optimizer and runtime that delivers low latency, high-throughput inference for AI applications. “With today’s release, TensorRT continues to expand its set of optimized layers, provides highly requested capabilities for conversational AI applications, delivering tighter integrations with frameworks to provide an easy path to deploy your applications on NVIDIA GPUs. In TensorRT 6, we’re also releasing new optimizations that deliver inference for BERT-Large in only 5.8 ms on T4 GPUs, making it practical for enterprises to deploy this model in production for the first time.”

Video: NVIDIA Rolls out TensorRT Hyperscale Platform and New T4 GPU for Ai Datacenters

This morning at GTC Japan, NVIDIA CEO Jensen Huang announced a set new products centered around Ai and accelerated computing. Targeting Hyperscale datacenters looking to run Ai workloads, NVIDIA continues to innovate Machine Learning technologies at an unprecedented pace. “There is no question that deep learning-powered AI is being deployed around the world, and we’re seeing incredible growth here,” Huang told an audience of more than 4,000 press, partners, academics and technologists gathered on the latest stop in a GTC world tour.

High Performance Inferencing with TensorRT

Chris Gottbrath from NVIDIA gave this talk at SC17 in Denver. “This talk will introduce the TensorRT Programmable Inference Accelerator which enables high throughput and low latency inference on clusters with NVIDIA V100, P100, P4 or P40 GPUs. TensorRT is both an optimizer and runtime – users provide a trained neural network and can easily creating highly efficient inference engines that can be incorporated into larger applications and services.”