Leakage and the Reproducibility Crisis in ML-based Science
Offered By: Inside Livermore Lab via YouTube
Course Description
Overview
Explore the critical issue of data leakage and reproducibility in machine learning-based science through this insightful 48-minute talk. Delve into a comprehensive investigation of reproducibility failures across 17 scientific fields, affecting 329 papers and leading to overly optimistic conclusions. Examine a detailed taxonomy of 8 types of leakage, ranging from basic errors to complex research challenges. Learn about proposed methodological changes, including model info sheets, to prevent leakage before publication. Discover the results of a reproducibility study in civil war prediction, revealing how complex ML models fail to outperform older statistical methods due to data leakage. Gain valuable insights from Sayash Kapoor, a Ph.D. candidate at Princeton University, whose research on ML methods in science has garnered recognition and been featured in prominent media outlets.
Syllabus
DSI | Leakage and the Reproducibility Crisis in ML-based Science
Taught by
Inside Livermore Lab
Related Courses
Statistical Learning with RStanford University via edX The Analytics Edge
Massachusetts Institute of Technology via edX Regression Models
Johns Hopkins University via Coursera Introduction à la statistique avec R
Université Paris SUD via France Université Numerique Statistical Reasoning for Public Health 2: Regression Methods
Johns Hopkins University via Coursera