Inference and Quantization for AI - Session 3
Offered By: Nvidia via YouTube
Course Description
Overview
Syllabus
Intro
OUTLINE
4-BIT QUANTIZATION
QUANTIZATION FOR INFERENCE
BINARY NEURAL NETWORKS
USING TENSOR CORES
QUANTIZED NETWORK ACCURACY
MAINTAINING SPEED AT BEST ACCURACY
SCALE-ONLY QUANTIZATION
PER-CHANNEL SCALING
TRAINING FOR QUANTIZATION
CONCLUSION
POST-TRAINING CALIBRATION
MIXED PRECISION NETWORKS
THE ROOT CAUSE
BRING YOUR OWN CALIBRATION
SUMMARY
INT PERFORMANCE
ALSO IN TensorRT
TF-TRT RELATIVE PERFORMANCE
OBJECT DETECTION - NMS
USING THE NEW NMS OP
NOW AVAILABLE ON GITHUB
TENSORRT HYPERSCALE INFERENCE PLATFORM
INEFFICIENCY LIMITS INNOVATION
NVIDIA TENSORRT INFERENCE SERVER
CURRENT FEATURES
AVAILABLE METRICS
DYNAMIC BATCHING
CONCURRENT MODEL EXECUTION-RESNET 50
NVIDIA RESEARCH AI PLAYGROUND
NV LEARN MORE AND DOWNLOAD TO USE
ADDITIONAL RESOURCES
Taught by
NVIDIA Developer
Tags
Related Courses
Azure SQL - What to Use, When, and What's NewPASS Data Community Summit via YouTube Building the Community Enterprise Operating System through CentOS Stream
DevConf via YouTube Hyperscale vDPA: Scaling Virtual Data Path Acceleration
Linux Foundation via YouTube Multiple Workloads and Protocols - One Software-Defined Solution for Flash Storage
Linux Foundation via YouTube What If Flash Was Software Defined - Revolutionizing Data Storage
Linux Foundation via YouTube