YoVDO

Quantization in Depth

Offered By: DeepLearning.AI via Coursera

Tags

Quantization Courses PyTorch Courses Model Compression Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
In Quantization in Depth you will build model quantization methods to shrink model weights to ¼ their original size, and apply methods to maintain the compressed model’s performance. Your ability to quantize your models can make them more accessible, and also faster at inference time. Implement and customize linear quantization from scratch so that you can study the tradeoff between space and performance, and then build a general-purpose quantizer in PyTorch that can quantize any open source model. You’ll implement techniques to compress model weights from 32 bits to 8 bits and even 2 bits. Join this course to: 1. Build and customize linear quantization functions, choosing between two “modes”: asymmetric and symmetric; and three granularities: per-tensor, per-channel, and per-group quantization. 2. Measure the quantization error of each of these options as you balance the performance and space tradeoffs for each option. 3. Build your own quantizer in PyTorch, to quantize any open source model’s dense layers from 32 bits to 8 bits. 4. Go beyond 8 bits, and pack four 2-bit weights into one 8-bit integer. Quantization in Depth lets you build and customize your own linear quantizer from scratch, going beyond standard open source libraries such as PyTorch and Quanto, which are covered in the short course Quantization Fundamentals, also by Hugging Face. This course gives you the foundation to study more advanced quantization methods, some of which are recommended at the end of the course.

Syllabus

  • Quantization in Depth
    • In Quantization in Depth you will build model quantization methods to shrink model weights to ¼ their original size, and apply methods to maintain the compressed model’s performance. Your ability to quantize your models can make them more accessible, and also faster at inference time. Implement and customize linear quantization from scratch so that you can study the tradeoff between space and performance, and then build a general-purpose quantizer in PyTorch that can quantize any open source model. You’ll implement techniques to compress model weights from 32 bits to 8 bits and even 2 bits.Join this course to: 1. Build and customize linear quantization functions, choosing between two “modes”: asymmetric and symmetric; and three granularities: per-tensor, per-channel, and per-group quantization. 2. Measure the quantization error of each of these options as you balance the performance and space tradeoffs for each option. 3. Build your own quantizer in PyTorch, to quantize any open source model’s dense layers from 32 bits to 8 bits.4. Go beyond 8 bits, and pack four 2-bit weights into one 8-bit integer. Quantization in Depth lets you build and customize your own linear quantizer from scratch, going beyond standard open source libraries such as PyTorch and Quanto, which are covered in the short course Quantization Fundamentals, also by Hugging Face. This course gives you the foundation to study more advanced quantization methods, some of which are recommended at the end of the course.

Taught by

Younes Belkada and Marc Sun

Related Courses

Quantization Fundamentals with Hugging Face
DeepLearning.AI via Coursera
TensorFlow Lite for Edge Devices - Tutorial
freeCodeCamp
A Gentle Introduction to Sparsity with a Concrete Example
MLOps World: Machine Learning in Production via YouTube
Applying Second-Order Pruning Algorithms for SOTA Model Compression
Neural Magic via YouTube
AWQ for LLM Quantization - Efficient Inference Framework for Large Language Models
MIT HAN Lab via YouTube