Retooling AI Training Sets for Improved Model Performance

Offered By: Snorkel AI via YouTube

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Explore the importance of training datasets in AI breakthroughs through this 26-minute talk by Ludwig Schmidt, Assistant Professor of Computer Science at the University of Washington. Learn about DataComp, a benchmark designed to shift focus from model architectures to dataset innovation. Discover how researchers can propose new training sets using a fixed 12.8B image-text pair pool from Common Crawl. Understand the evaluation process using standardized CLIP training code and 38 downstream test sets. Examine the multiple scales of the DataComp benchmark, which facilitate scaling trend studies and accommodate researchers with varying resources. Gain insights into the promising results of baseline experiments, including the introduction of DataComp-1B dataset, which outperforms OpenAI's CLIP model on ImageNet while using the same compute budget. Compare the data improvement to LAION-5B, showcasing a 9x improvement in compute cost. Delve into the potential of the DataComp workflow for advancing multimodal datasets and enhancing AI training methodologies.

Syllabus

Why You Should Retool Your AI Training Set (Not Your Model)

Taught by

Snorkel AI

Retooling AI Training Sets for Improved Model Performance

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Retooling AI Training Sets for Improved Model Performance

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Login to Continue