YoVDO

Multi-GPU Fine-tuning with DDP and FSDP

Offered By: Trelis Research via YouTube

Tags

PyTorch Courses LoRA (Low-Rank Adaptation) Courses Quantization Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Dive into the world of multi-GPU fine-tuning with this comprehensive tutorial on Distributed Data Parallel (DDP) and Fully Sharded Data Parallel (FSDP) techniques. Learn how to optimize VRAM usage, understand the intricacies of the Adam optimizer, and explore the trade-offs between various distributed training methods. Gain practical insights on choosing the right GPU setup, implementing LoRA and quantization for VRAM reduction, and utilizing tools like DeepSpeed and Accelerate. Follow along with code examples for Model Parallel, DDP, and FSDP implementations, and discover how to set up and use rented GPUs via SSH. By the end of this tutorial, you'll be equipped with the knowledge to efficiently fine-tune large language models across multiple GPUs.

Syllabus

Multi-GPU Distributed Training
Video Overview
Choosing a GPU setup
Understanding VRAM requirements in detail
Understanding Optimisation and Gradient Descent
How does the Adam optimizer work?
How the Adam optimiser affects VRAM requirements
Effect of activations, model context and batch size on VRAM
Tip for GPU setup - start with a small batch size
Reducing VRAM with LoRA and quantisation
Quality trade-offs with quantisation and LoRA
Choosing between MP, DDP or FSDP
Distributed Data Parallel
Model Parallel and Fully Sharded Data Parallel FSDP
Trade-offs with DDP and FSDP
How does DeepSpeed compare to FSDP
Using FSDP and DeepSpeed with Accelerate
Code examples for MP, DDP and FSDP
Using SSH with rented GPUs Runpod
Installation
slight detour Setting a username and email for GitHub
Basic Model Parallel MP fine-tuning script
Fine-tuning script with Distributed Data Parallel DDP
Fine-tuning script with Fully Shaded Data Parallel FSDP
Running ‘accelerate config’ for FSDP
Saving a model after FSDP fine-tuning
Quick demo of a complete FSDP LoRA training script
Quick demo of an inference script after training
Wrap up


Taught by

Trelis Research

Related Courses

Digital Signal Processing
École Polytechnique Fédérale de Lausanne via Coursera
Principles of Communication Systems - I
Indian Institute of Technology Kanpur via Swayam
Digital Signal Processing 2: Filtering
École Polytechnique Fédérale de Lausanne via Coursera
Digital Signal Processing 3: Analog vs Digital
École Polytechnique Fédérale de Lausanne via Coursera
Digital Signal Processing 4: Applications
École Polytechnique Fédérale de Lausanne via Coursera