Direct Preference Optimization (DPO) - Advanced Fine-Tuning Technique
Offered By: Trelis Research via YouTube
Course Description
Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore Direct Preference Optimization (DPO), a cutting-edge technique in machine learning, through this comprehensive 43-minute video tutorial by Trelis Research. Learn how DPO differs from traditional fine-tuning methods and compares to RLHF. Dive into practical applications using datasets like UltraChat and Anthropic's Helpful and Harmless. Follow along with a detailed DPO notebook run-through, interpret evaluation results using Weights and Biases, and set up Runpod for a one-epoch training run. Gain access to valuable resources, including Google Slides, datasets, and scripts to enhance your understanding and implementation of DPO in advanced fine-tuning projects.
Syllabus
Direct Preference Optimisation
Video Overview
How does “normal” fine-tuning work?
How does DPO work?
DPO Datasets: UltraChat
DPO Datasets: Helpful and Harmless
DPO vs RLHF
Required datasets and SFT models
DPO Notebook Run through
DPO Evaluation Results
Weights and Biases Results Interpretation
Runpod Setup for 1 epoch Training Run
Resources
Taught by
Trelis Research
Related Courses
4.0 Shades of Digitalisation for the Chemical and Process IndustriesUniversity of Padova via FutureLearn A Day in the Life of a Data Engineer
Amazon Web Services via AWS Skill Builder FinTech for Finance and Business Leaders
ACCA via edX Accounting Data Analytics
University of Illinois at Urbana-Champaign via Coursera Accounting Data Analytics
Coursera