Why GenAI Needs Careful Training Data Management
Offered By: Snorkel AI via YouTube
Course Description
Overview
Explore the critical importance of managing training data for large language models in this 18-minute talk by Stephen Bach, assistant professor of computer science at Brown University. Discover the three sequential stages of LLM training and learn why harmonizing data across these stages is essential for model effectiveness. Examine two research vignettes from Bach's lab, illustrating how to adapt GenAI models to new domains through automatic generation of instruction tuning data, and revealing potential safety vulnerabilities in GPT-4 for low-resource languages due to improperly harmonized data. Access accompanying slides and additional resources to deepen your understanding of data harmonization in GenAI development. Gain valuable insights into the complexities of LLM training and the impact of careful data management on model performance and safety.
Syllabus
Why GenAI Needs Careful Training Data Management
Taught by
Snorkel AI
Related Courses
Solving the Last Mile Problem of Foundation Models with Data-Centric AIMLOps.community via YouTube Foundational Models in Enterprise AI - Challenges and Opportunities
MLOps.community via YouTube Knowledge Distillation Demystified: Techniques and Applications
Snorkel AI via YouTube Model Distillation - From Large Models to Efficient Enterprise Solutions
Snorkel AI via YouTube Curate Training Data via Labeling Functions - 10 to 100x Faster
Snorkel AI via YouTube