Compressing Large Language Models (LLMs) with Python Code - 3 Techniques
Offered By: Shaw Talebi via YouTube
Course Description
Overview
Explore three methods for compressing Large Language Models (LLMs) - Quantization, Pruning, and Knowledge Distillation/Model Distillation - with accompanying Python code examples. Learn about the challenges of model size and the benefits of compression techniques. Follow along with a practical demonstration of combining Knowledge Distillation and Quantization to compress a BERT-based phishing classifier model. Access additional resources including a blog post, GitHub repository, pre-trained models, and dataset for further exploration of LLM compression techniques.
Syllabus
Intro -
"Bigger is Better" -
The Problem -
Model Compression -
1 Quantization -
2 Pruning -
3 Knowledge Distillation -
Example: Compressing a model with KD + Quantization -
Taught by
Shaw Talebi
Related Courses
Sentiment Analysis with Deep Learning using BERTCoursera Project Network via Coursera Natural Language Processing with Attention Models
DeepLearning.AI via Coursera Fine Tune BERT for Text Classification with TensorFlow
Coursera Project Network via Coursera Deploy a BERT question answering bot on Django
Coursera Project Network via Coursera Generating discrete sequences: language and music
Ural Federal University via edX