Full Fine-tuning LLMs with Lower VRAM: Optimizers, GaLore, and Advanced Techniques
Offered By: Trelis Research via YouTube
Course Description
Overview
Syllabus
LLM Full fine-tuning with lower VRAM
Video Overview
Understanding Optimisers
Stochastic Gradient Descent SGD
AdamW Optimizer and VRAM requirements
AdamW 8-bit optimizer
Adafactor optimiser and memory requirements
GaLore - reducing gradient and optimizer VRAM
LoRA versus GaLoRe
Better and Faster GaLoRe via Subspace Descent
Layerwise gradient updates
Training Scripts
How gradient checkpointing works to reduce memory
AdamW Performance
AdamW 8bit Performance
Adafactor with manual learning rate and schedule
Adafactor with default/auto learning rate
Galore AdamW
Galore AdamW with Subspace descent
Using AdamW8bit and Adafactor with GaLoRe
Notebook demo of layerwise gradient updates
Running with LoRa
Inferencing and Pushing Models to Hub
Single GPU Recommendations
Multi-GPU Recommendations
Resources
Taught by
Trelis Research
Related Courses
How to Do Stable Diffusion LORA Training by Using Web UI on Different ModelsSoftware Engineering Courses - SE Courses via YouTube MicroPython & WiFi
Kevin McAleer via YouTube Building a Wireless Community Sensor Network with LoRa
Hackaday via YouTube ComfyUI - Node Based Stable Diffusion UI
Olivio Sarikas via YouTube AI Masterclass for Everyone - Stable Diffusion, ControlNet, Depth Map, LORA, and VR
Hugh Hou via YouTube