Compressing Large Language Models (LLMs) with Python Code - 3 Techniques

Offered By: Shaw Talebi via YouTube

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Explore three methods for compressing Large Language Models (LLMs) - Quantization, Pruning, and Knowledge Distillation/Model Distillation - with accompanying Python code examples. Learn about the challenges of model size and the benefits of compression techniques. Follow along with a practical demonstration of combining Knowledge Distillation and Quantization to compress a BERT-based phishing classifier model. Access additional resources including a blog post, GitHub repository, pre-trained models, and dataset for further exploration of LLM compression techniques.

Syllabus

Intro -
"Bigger is Better" -
The Problem -
Model Compression -
1 Quantization -
2 Pruning -
3 Knowledge Distillation -
Example: Compressing a model with KD + Quantization -

Taught by

Shaw Talebi

Related Courses

TensorFlow Lite for Edge Devices - Tutorial
freeCodeCamp Few-Shot Learning in Production
HuggingFace via YouTube TinyML Talks Germany - Neural Network Framework Using Emerging Technologies for Screening Diabetic
tinyML via YouTube TinyML for All: Full-stack Optimization for Diverse Edge AI Platforms
tinyML via YouTube TinyML Talks - Software-Hardware Co-design for Tiny AI Systems
tinyML via YouTube