YoVDO

Extracting Training Data from Large Language Models - Paper Explained

Offered By: Yannic Kilcher via YouTube

Tags

Language Models Courses Artificial Intelligence Courses Data Privacy Courses

Course Description

Overview

Explore a comprehensive video analysis of a research paper that uncovers a method for extracting verbatim training data from large language models. Delve into the security and privacy implications for models like GPT-3, as the presenter breaks down the paper's findings, methodology, and results. Learn about eidetic memorization in language models, the adversary's objectives, and the two-step extraction method. Examine the analysis of main results, including the vulnerability of larger models, and consider proposed mitigation strategies. Gain insights into the ethical concerns surrounding the publication of large language models trained on private datasets and the potential risks of exposing personally identifiable information.

Syllabus

- Intro & Overview
- Personal Data Example
- Eidetic Memorization & Language Models
- Adversary's Objective & Outlier Data
- Ethical Hedging
- Two-Step Method Overview
- Perplexity Baseline
- Improvement via Perplexity Ratios
- Weights for Patterns & Weights for Memorization
- Analysis of Main Results
- Mitigation Strategies
- Conclusion & Comments


Taught by

Yannic Kilcher

Related Courses

Microsoft Bot Framework and Conversation as a Platform
Microsoft via edX
Unlocking the Power of OpenAI for Startups - Microsoft for Startups
Microsoft via YouTube
Improving Customer Experiences with Speech to Text and Text to Speech
Microsoft via YouTube
Stanford Seminar - Deep Learning in Speech Recognition
Stanford University via YouTube
Select Topics in Python: Natural Language Processing
Codio via Coursera