Inference and Quantization for AI - Session 3
Offered By: Nvidia via YouTube
Course Description
Overview
Syllabus
Intro
OUTLINE
4-BIT QUANTIZATION
QUANTIZATION FOR INFERENCE
BINARY NEURAL NETWORKS
USING TENSOR CORES
QUANTIZED NETWORK ACCURACY
MAINTAINING SPEED AT BEST ACCURACY
SCALE-ONLY QUANTIZATION
PER-CHANNEL SCALING
TRAINING FOR QUANTIZATION
CONCLUSION
POST-TRAINING CALIBRATION
MIXED PRECISION NETWORKS
THE ROOT CAUSE
BRING YOUR OWN CALIBRATION
SUMMARY
INT PERFORMANCE
ALSO IN TensorRT
TF-TRT RELATIVE PERFORMANCE
OBJECT DETECTION - NMS
USING THE NEW NMS OP
NOW AVAILABLE ON GITHUB
TENSORRT HYPERSCALE INFERENCE PLATFORM
INEFFICIENCY LIMITS INNOVATION
NVIDIA TENSORRT INFERENCE SERVER
CURRENT FEATURES
AVAILABLE METRICS
DYNAMIC BATCHING
CONCURRENT MODEL EXECUTION-RESNET 50
NVIDIA RESEARCH AI PLAYGROUND
NV LEARN MORE AND DOWNLOAD TO USE
ADDITIONAL RESOURCES
Taught by
NVIDIA Developer
Tags
Related Courses
3D-печать для всех и каждогоTomsk State University via Coursera Developing a Multidimensional Data Model
Microsoft via edX Launching into Machine Learning 日本語版
Google Cloud via Coursera Art and Science of Machine Learning 日本語版
Google Cloud via Coursera Launching into Machine Learning auf Deutsch
Google Cloud via Coursera