YoVDO

Direct Preference Optimization - Fine-Tuning LLMs Without Reinforcement Learning

Offered By: Serrano.Academy via YouTube

Tags

Machine Learning Courses Reinforcement Learning Courses Loss Functions Courses Model Training Courses KL Divergence Courses RLHF Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the innovative Direct Preference Optimization (DPO) method for training Large Language Models in this 21-minute video tutorial. Discover how DPO offers a more effective and efficient alternative to reinforcement learning techniques. Delve into key concepts such as the Bradley-Terry Model, KL Divergence, and the Loss Function. Compare DPO with Reinforcement Learning with Human Feedback (RLHF) to understand its advantages. As the third installment in a four-part series on reinforcement learning methods for LLMs, this video provides valuable insights for those interested in advanced machine learning techniques. Access additional resources, including related videos in the series and a recommended book on machine learning, to further enhance your understanding of LLM training methodologies.

Syllabus

Introduction
RLHF vs DPO
The Bradley-Terry Model
KL Divergence
The Loss Function
Conclusion


Taught by

Serrano.Academy

Related Courses

How Google does Machine Learning en EspaƱol
Google Cloud via Coursera
Creating Custom Callbacks in Keras
Coursera Project Network via Coursera
Automatic Machine Learning with H2O AutoML and Python
Coursera Project Network via Coursera
AI in Healthcare Capstone
Stanford University via Coursera
AutoML con Pycaret y TPOT
Coursera Project Network via Coursera