Retooling AI Training Sets for Improved Model Performance
Offered By: Snorkel AI via YouTube
Course Description
Overview
Explore the importance of training datasets in AI breakthroughs through this 26-minute talk by Ludwig Schmidt, Assistant Professor of Computer Science at the University of Washington. Learn about DataComp, a benchmark designed to shift focus from model architectures to dataset innovation. Discover how researchers can propose new training sets using a fixed 12.8B image-text pair pool from Common Crawl. Understand the evaluation process using standardized CLIP training code and 38 downstream test sets. Examine the multiple scales of the DataComp benchmark, which facilitate scaling trend studies and accommodate researchers with varying resources. Gain insights into the promising results of baseline experiments, including the introduction of DataComp-1B dataset, which outperforms OpenAI's CLIP model on ImageNet while using the same compute budget. Compare the data improvement to LAION-5B, showcasing a 9x improvement in compute cost. Delve into the potential of the DataComp workflow for advancing multimodal datasets and enhancing AI training methodologies.
Syllabus
Why You Should Retool Your AI Training Set (Not Your Model)
Taught by
Snorkel AI
Related Courses
Introduction to Artificial IntelligenceStanford University via Udacity Computer Vision: The Fundamentals
University of California, Berkeley via Coursera Computational Photography
Georgia Institute of Technology via Coursera Einführung in Computer Vision
Technische Universität München (Technical University of Munich) via Coursera Introduction to Computer Vision
Georgia Institute of Technology via Udacity