Direct Preference Optimization - Fine-Tuning LLMs Without Reinforcement Learning
Offered By: Serrano.Academy via YouTube
Course Description
Overview
Explore the innovative Direct Preference Optimization (DPO) method for training Large Language Models in this 21-minute video tutorial. Discover how DPO offers a more effective and efficient alternative to reinforcement learning techniques. Delve into key concepts such as the Bradley-Terry Model, KL Divergence, and the Loss Function. Compare DPO with Reinforcement Learning with Human Feedback (RLHF) to understand its advantages. As the third installment in a four-part series on reinforcement learning methods for LLMs, this video provides valuable insights for those interested in advanced machine learning techniques. Access additional resources, including related videos in the series and a recommended book on machine learning, to further enhance your understanding of LLM training methodologies.
Syllabus
Introduction
RLHF vs DPO
The Bradley-Terry Model
KL Divergence
The Loss Function
Conclusion
Taught by
Serrano.Academy
Related Courses
Mastering ChatGPT (AI) and PowerPoint presentationUdemy Reinforcement Learning with TorchRL and TensoDict - NeurIPS Hacker Cup AI
Weights & Biases via YouTube Reinforcement Learning from Human Feedback (RLHF) Explained
IBM via YouTube PUZZLE: Efficiently Aligning Large Language Models through Light-Weight Context Switching
USENIX via YouTube Scalable and Flexible Distributed Reinforcement Learning Systems
Finnish Center for Artificial Intelligence FCAI via YouTube