YoVDO

Inference and Quantization for AI - Session 3

Offered By: Nvidia via YouTube

Tags

Quantization Courses Deep Learning Courses Neural Networks Courses TensorFlow Courses Object Detection Courses Model Optimization Courses TensorRT Courses Hyperscale Computing Courses

Course Description

Overview

Explore advanced techniques for AI inference and quantization in this session from the NVIDIA AI Tech Workshop at NeurIPS Expo 2018. Dive into quantized inference, NVIDIA TensorRTâ„¢ 5 and TensorFlow integration, and the TensorRT Inference Server. Learn about 4-bit quantization, binary neural networks, tensor cores, and strategies for maintaining speed while optimizing accuracy. Discover post-training calibration techniques, mixed precision networks, and the benefits of per-channel scaling. Gain insights into object detection with NMS, the TensorRT hyperscale inference platform, and the NVIDIA TensorRT Inference Server's features including dynamic batching and concurrent model execution. Access additional resources and tools to enhance your AI inference capabilities.

Syllabus

Intro
OUTLINE
4-BIT QUANTIZATION
QUANTIZATION FOR INFERENCE
BINARY NEURAL NETWORKS
USING TENSOR CORES
QUANTIZED NETWORK ACCURACY
MAINTAINING SPEED AT BEST ACCURACY
SCALE-ONLY QUANTIZATION
PER-CHANNEL SCALING
TRAINING FOR QUANTIZATION
CONCLUSION
POST-TRAINING CALIBRATION
MIXED PRECISION NETWORKS
THE ROOT CAUSE
BRING YOUR OWN CALIBRATION
SUMMARY
INT PERFORMANCE
ALSO IN TensorRT
TF-TRT RELATIVE PERFORMANCE
OBJECT DETECTION - NMS
USING THE NEW NMS OP
NOW AVAILABLE ON GITHUB
TENSORRT HYPERSCALE INFERENCE PLATFORM
INEFFICIENCY LIMITS INNOVATION
NVIDIA TENSORRT INFERENCE SERVER
CURRENT FEATURES
AVAILABLE METRICS
DYNAMIC BATCHING
CONCURRENT MODEL EXECUTION-RESNET 50
NVIDIA RESEARCH AI PLAYGROUND
NV LEARN MORE AND DOWNLOAD TO USE
ADDITIONAL RESOURCES


Taught by

NVIDIA Developer

Tags

Related Courses

Azure SQL - What to Use, When, and What's New
PASS Data Community Summit via YouTube
Building the Community Enterprise Operating System through CentOS Stream
DevConf via YouTube
Hyperscale vDPA: Scaling Virtual Data Path Acceleration
Linux Foundation via YouTube
Multiple Workloads and Protocols - One Software-Defined Solution for Flash Storage
Linux Foundation via YouTube
What If Flash Was Software Defined - Revolutionizing Data Storage
Linux Foundation via YouTube