YoVDO

Why GenAI Needs Careful Training Data Management

Offered By: Snorkel AI via YouTube

Tags

Generative AI Courses GPT-4 Courses Domain Adaptation Courses Low-Resource Languages Courses Instruction-Tuning Courses Snorkel AI Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the critical importance of managing training data for large language models in this 18-minute talk by Stephen Bach, assistant professor of computer science at Brown University. Discover the three sequential stages of LLM training and learn why harmonizing data across these stages is essential for model effectiveness. Examine two research vignettes from Bach's lab, illustrating how to adapt GenAI models to new domains through automatic generation of instruction tuning data, and revealing potential safety vulnerabilities in GPT-4 for low-resource languages due to improperly harmonized data. Access accompanying slides and additional resources to deepen your understanding of data harmonization in GenAI development. Gain valuable insights into the complexities of LLM training and the impact of careful data management on model performance and safety.

Syllabus

Why GenAI Needs Careful Training Data Management


Taught by

Snorkel AI

Related Courses

Towards Reliable Use of Large Language Models - Better Detection, Consistency, and Instruction-Tuning
Simons Institute via YouTube
Role of Instruction-Tuning and Prompt Engineering in Clinical Domain - MedAI 125
Stanford University via YouTube
Generative AI Advance Fine-Tuning for LLMs
IBM via Coursera
SeaLLMs - Large Language Models for Southeast Asia
VinAI via YouTube
Fine-tuning LLMs with Hugging Face SFT and QLoRA - LLMOps Techniques
LLMOps Space via YouTube