YoVDO

Direct Preference Optimization (DPO) - Advanced Fine-Tuning Technique

Offered By: Trelis Research via YouTube

Tags

Machine Learning Courses Fine-Tuning Courses Hugging Face Courses RLHF Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore Direct Preference Optimization (DPO), a cutting-edge technique in machine learning, through this comprehensive 43-minute video tutorial by Trelis Research. Learn how DPO differs from traditional fine-tuning methods and compares to RLHF. Dive into practical applications using datasets like UltraChat and Anthropic's Helpful and Harmless. Follow along with a detailed DPO notebook run-through, interpret evaluation results using Weights and Biases, and set up Runpod for a one-epoch training run. Gain access to valuable resources, including Google Slides, datasets, and scripts to enhance your understanding and implementation of DPO in advanced fine-tuning projects.

Syllabus

Direct Preference Optimisation
Video Overview
How does “normal” fine-tuning work?
How does DPO work?
DPO Datasets: UltraChat
DPO Datasets: Helpful and Harmless
DPO vs RLHF
Required datasets and SFT models
DPO Notebook Run through
DPO Evaluation Results
Weights and Biases Results Interpretation
Runpod Setup for 1 epoch Training Run
Resources


Taught by

Trelis Research

Related Courses

4.0 Shades of Digitalisation for the Chemical and Process Industries
University of Padova via FutureLearn
A Day in the Life of a Data Engineer
Amazon Web Services via AWS Skill Builder
FinTech for Finance and Business Leaders
ACCA via edX
Accounting Data Analytics
University of Illinois at Urbana-Champaign via Coursera
Accounting Data Analytics
Coursera