QLoRA - How to Fine-tune an LLM on a Single GPU with Python Code
Offered By: Shaw Talebi via YouTube
Course Description
Overview
Learn how to fine-tune a large language model (LLM) using QLoRA (Quantized Low-rank Adaptation) on a single GPU in this comprehensive 37-minute video tutorial. Explore the four key ingredients of QLoRA: 4-bit NormalFloat, Double Quantization, Paged Optimizer, and LoRA. Follow along with example Python code to train a custom YouTube comment responder using Mistral-7b-Instruct. Gain insights into quantization techniques, computational efficiency, and practical implementation. Access additional resources including a series playlist, related videos, blog post, Colab notebook, GitHub repository, and Hugging Face model and dataset links for further learning and experimentation.
Syllabus
Intro -
Fine-tuning recap -
LLMs are computationally expensive -
What is Quantization? -
4 Ingredients of QLoRA -
Ingredient 1: 4-bit NormalFloat -
Ingredient 2: Double Quantization -
Ingredient 3: Paged Optimizer -
Ingredient 4: LoRA -
Bringing it all together -
Example code: Fine-tuning Mistral-7b-Instruct for YT Comments -
What's Next? -
Taught by
Shaw Talebi
Related Courses
Fine-Tuning LLM with QLoRA on Single GPU - Training Falcon-7b on ChatBot Support FAQ DatasetVenelin Valkov via YouTube Deploy LLM to Production on Single GPU - REST API for Falcon 7B with QLoRA on Inference Endpoints
Venelin Valkov via YouTube Building an LLM Fine-Tuning Dataset - From Reddit Comments to QLoRA Training
sentdex via YouTube Generative AI: Fine-Tuning LLM Models Crash Course
Krish Naik via YouTube Aligning Open Language Models - Stanford CS25 Lecture
Stanford University via YouTube