In this video from SC17 in Denver, Chris Gottbrath from NVIDIA presents: High Performance Inferencing with TensorRT.
“This talk will introduce the TensorRT Programmable Inference Accelerator which enables high throughput and low latency inference on clusters with NVIDIA V100, P100, P4 or P40 GPUs. TensorRT is both an optimizer and runtime – users provide a trained neural network and can easily creating highly efficient inference engines that can be incorporated into larger applications and services. This talk will cover the capabilities, workflow, and performance of TensorRT 3.0 itself and highlight several ways that it can be used to enable organizations add ground breaking DL powered features or save money as they scale out existing services.”
Chris Gottbrath is the product manager for the TensorRT programmable inference accelerator at NVIDIA. TensorRT enables users to easily deploy neural networks in data centers, automobiles, and robots, and delivers high throughput and low latency. Chris has been attending GTC since 2010. He’s always ready to talk about GPUs, deep learning, astronomy, HPC, debugging technology, and math libraries.