Compressing Large Language Models (LLMs) with Python Code - 3 Techniques
Offered By: Shaw Talebi via YouTube
Course Description
Overview
Explore three methods for compressing Large Language Models (LLMs) - Quantization, Pruning, and Knowledge Distillation/Model Distillation - with accompanying Python code examples. Learn about the challenges of model size and the benefits of compression techniques. Follow along with a practical demonstration of combining Knowledge Distillation and Quantization to compress a BERT-based phishing classifier model. Access additional resources including a blog post, GitHub repository, pre-trained models, and dataset for further exploration of LLM compression techniques.
Syllabus
Intro -
"Bigger is Better" -
The Problem -
Model Compression -
1 Quantization -
2 Pruning -
3 Knowledge Distillation -
Example: Compressing a model with KD + Quantization -
Taught by
Shaw Talebi
Related Courses
Artificial Intelligence for RoboticsStanford University via Udacity Intro to Computer Science
University of Virginia via Udacity Design of Computer Programs
Stanford University via Udacity Web Development
Udacity Programming Languages
University of Virginia via Udacity