Retooling AI Training Sets for Improved Model Performance
Offered By: Snorkel AI via YouTube
Course Description
Overview
Explore the importance of training datasets in AI breakthroughs through this 26-minute talk by Ludwig Schmidt, Assistant Professor of Computer Science at the University of Washington. Learn about DataComp, a benchmark designed to shift focus from model architectures to dataset innovation. Discover how researchers can propose new training sets using a fixed 12.8B image-text pair pool from Common Crawl. Understand the evaluation process using standardized CLIP training code and 38 downstream test sets. Examine the multiple scales of the DataComp benchmark, which facilitate scaling trend studies and accommodate researchers with varying resources. Gain insights into the promising results of baseline experiments, including the introduction of DataComp-1B dataset, which outperforms OpenAI's CLIP model on ImageNet while using the same compute budget. Compare the data improvement to LAION-5B, showcasing a 9x improvement in compute cost. Delve into the potential of the DataComp workflow for advancing multimodal datasets and enhancing AI training methodologies.
Syllabus
Why You Should Retool Your AI Training Set (Not Your Model)
Taught by
Snorkel AI
Related Courses
Investment Strategies and Portfolio AnalysisRice University via Coursera Advanced R Programming
Johns Hopkins University via Coursera Supply Chain Analytics
Rutgers University via Coursera Технологическое предпринимательство
Moscow Institute of Physics and Technology via Coursera Learn How To Code: Google's Go (golang) Programming Language
Udemy