Inference and Quantization for AI - Session 3
Offered By: Nvidia via YouTube
Course Description
Overview
Syllabus
Intro
OUTLINE
4-BIT QUANTIZATION
QUANTIZATION FOR INFERENCE
BINARY NEURAL NETWORKS
USING TENSOR CORES
QUANTIZED NETWORK ACCURACY
MAINTAINING SPEED AT BEST ACCURACY
SCALE-ONLY QUANTIZATION
PER-CHANNEL SCALING
TRAINING FOR QUANTIZATION
CONCLUSION
POST-TRAINING CALIBRATION
MIXED PRECISION NETWORKS
THE ROOT CAUSE
BRING YOUR OWN CALIBRATION
SUMMARY
INT PERFORMANCE
ALSO IN TensorRT
TF-TRT RELATIVE PERFORMANCE
OBJECT DETECTION - NMS
USING THE NEW NMS OP
NOW AVAILABLE ON GITHUB
TENSORRT HYPERSCALE INFERENCE PLATFORM
INEFFICIENCY LIMITS INNOVATION
NVIDIA TENSORRT INFERENCE SERVER
CURRENT FEATURES
AVAILABLE METRICS
DYNAMIC BATCHING
CONCURRENT MODEL EXECUTION-RESNET 50
NVIDIA RESEARCH AI PLAYGROUND
NV LEARN MORE AND DOWNLOAD TO USE
ADDITIONAL RESOURCES
Taught by
NVIDIA Developer
Tags
Related Courses
Optimize TensorFlow Models For Deployment with TensorRTCoursera Project Network via Coursera Jetson Xavier NX Developer Kit - Edge AI Supercomputer Features and Applications
Nvidia via YouTube NVIDIA Jetson: Enabling AI-Powered Autonomous Machines at Scale
Nvidia via YouTube Jetson AGX Xavier: Architecture and Applications for Autonomous Machines
Nvidia via YouTube Streamline Deep Learning for Video Analytics with DeepStream SDK 2.0
Nvidia via YouTube