YoVDO

Direct Preference Optimization (DPO): How It Works and How It Topped an LLM Eval Leaderboard

Offered By: Snorkel AI via YouTube

Tags

Machine Learning Courses Reinforcement Learning Courses Model Evaluation Courses RLHF Courses Snorkel AI Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the cutting-edge approach of Direct Preference Optimization (DPO) for aligning large language models (LLMs) with user preferences in this 12-minute interview with Snorkel AI researcher Hoang Tran. Learn how DPO topped the AlpacaEval leaderboard and subsequently influenced changes in LLM evaluation methods. Discover the key differences between DPO and Reinforcement Learning with Human Feedback (RLHF), understanding why DPO is considered more stable and computationally efficient. Gain insights into the future of LLM evaluation and how DPO can benefit enterprises in building better language models. This video is ideal for machine learning engineers, NLP researchers, and anyone interested in the advancements of AI technology. Delve deeper into Tran's DPO efforts through the provided blog post link and explore more AI research talks in the linked playlist.

Syllabus

Direct Preference Optimization (DPO): How It Works and How It Topped an LLM Eval Leaderboard


Taught by

Snorkel AI

Related Courses

Introduction to Artificial Intelligence
Stanford University via Udacity
Natural Language Processing
Columbia University via Coursera
Probabilistic Graphical Models 1: Representation
Stanford University via Coursera
Computer Vision: The Fundamentals
University of California, Berkeley via Coursera
Learning from Data (Introductory Machine Learning course)
California Institute of Technology via Independent