Direct Preference Optimization (DPO) - Advanced Fine-Tuning Technique
Offered By: Trelis Research via YouTube
Course Description
Overview
Explore Direct Preference Optimization (DPO), a cutting-edge technique in machine learning, through this comprehensive 43-minute video tutorial by Trelis Research. Learn how DPO differs from traditional fine-tuning methods and compares to RLHF. Dive into practical applications using datasets like UltraChat and Anthropic's Helpful and Harmless. Follow along with a detailed DPO notebook run-through, interpret evaluation results using Weights and Biases, and set up Runpod for a one-epoch training run. Gain access to valuable resources, including Google Slides, datasets, and scripts to enhance your understanding and implementation of DPO in advanced fine-tuning projects.
Syllabus
Direct Preference Optimisation
Video Overview
How does “normal” fine-tuning work?
How does DPO work?
DPO Datasets: UltraChat
DPO Datasets: Helpful and Harmless
DPO vs RLHF
Required datasets and SFT models
DPO Notebook Run through
DPO Evaluation Results
Weights and Biases Results Interpretation
Runpod Setup for 1 epoch Training Run
Resources
Taught by
Trelis Research
Related Courses
Introduction to Artificial IntelligenceStanford University via Udacity Natural Language Processing
Columbia University via Coursera Probabilistic Graphical Models 1: Representation
Stanford University via Coursera Computer Vision: The Fundamentals
University of California, Berkeley via Coursera Learning from Data (Introductory Machine Learning course)
California Institute of Technology via Independent