Leakage and the Reproducibility Crisis in ML-based Science

Offered By: Inside Livermore Lab via YouTube

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Explore the critical issue of data leakage and reproducibility in machine learning-based science through this insightful 48-minute talk. Delve into a comprehensive investigation of reproducibility failures across 17 scientific fields, affecting 329 papers and leading to overly optimistic conclusions. Examine a detailed taxonomy of 8 types of leakage, ranging from basic errors to complex research challenges. Learn about proposed methodological changes, including model info sheets, to prevent leakage before publication. Discover the results of a reproducibility study in civil war prediction, revealing how complex ML models fail to outperform older statistical methods due to data leakage. Gain valuable insights from Sayash Kapoor, a Ph.D. candidate at Princeton University, whose research on ML methods in science has garnered recognition and been featured in prominent media outlets.

Syllabus

DSI | Leakage and the Reproducibility Crisis in ML-based Science

Taught by

Inside Livermore Lab

Leakage and the Reproducibility Crisis in ML-based Science

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Leakage and the Reproducibility Crisis in ML-based Science

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Login to Continue