YoVDO

Learning to Summarize from Human Feedback

Offered By: Yannic Kilcher via YouTube

Tags

Supervised Learning Courses Machine Learning Courses Reinforcement Learning Courses

Course Description

Overview

Explore an in-depth analysis of OpenAI's paper on improving text summarization through human feedback in this 46-minute video. Dive into the challenges of training and evaluating summarization models, learn about the limitations of traditional metrics like ROUGE, and discover how incorporating direct human feedback can significantly enhance summary quality. Examine the novel approach of using reward model proxies and reinforcement learning to train models that outperform single humans in summary generation. Follow along as the video breaks down key concepts, methodologies, and results, including the application to Reddit posts and transfer to news articles. Gain insights into the broader implications of this research for machine learning and natural language processing.

Syllabus

- Intro & Overview
- Summarization as a Task
- Problems with the ROUGE Metric
- Training Supervised Models
- Main Results
- Including Human Feedback with Reward Models & RL
- The Unknown Effect of Better Data
- KL Constraint & Connection to Adversarial Examples
- More Results
- Understanding the Reward Model
- Limitations & Broader Impact


Taught by

Yannic Kilcher

Related Courses

Computational Neuroscience
University of Washington via Coursera
Reinforcement Learning
Brown University via Udacity
Reinforcement Learning
Indian Institute of Technology Madras via Swayam
FA17: Machine Learning
Georgia Institute of Technology via edX
Introduction to Reinforcement Learning
Higher School of Economics via Coursera