YoVDO

Extracting Training Data from Large Language Models - Paper Explained

Offered By: Yannic Kilcher via YouTube

Tags

Language Models Courses Artificial Intelligence Courses Data Privacy Courses

Course Description

Overview

Explore a comprehensive video analysis of a research paper that uncovers a method for extracting verbatim training data from large language models. Delve into the security and privacy implications for models like GPT-3, as the presenter breaks down the paper's findings, methodology, and results. Learn about eidetic memorization in language models, the adversary's objectives, and the two-step extraction method. Examine the analysis of main results, including the vulnerability of larger models, and consider proposed mitigation strategies. Gain insights into the ethical concerns surrounding the publication of large language models trained on private datasets and the potential risks of exposing personally identifiable information.

Syllabus

- Intro & Overview
- Personal Data Example
- Eidetic Memorization & Language Models
- Adversary's Objective & Outlier Data
- Ethical Hedging
- Two-Step Method Overview
- Perplexity Baseline
- Improvement via Perplexity Ratios
- Weights for Patterns & Weights for Memorization
- Analysis of Main Results
- Mitigation Strategies
- Conclusion & Comments


Taught by

Yannic Kilcher

Related Courses

Building Language Models on AWS (Japanese)
Amazon Web Services via AWS Skill Builder
Building Language Models on AWS (Korean)
Amazon Web Services via AWS Skill Builder
Building Language Models on AWS (Simplified Chinese)
Amazon Web Services via AWS Skill Builder
Building Language Models on AWS (Traditional Chinese)
Amazon Web Services via AWS Skill Builder
Introduction to ChatGPT
edX