YoVDO

Leakage and the Reproducibility Crisis in ML-based Science

Offered By: Inside Livermore Lab via YouTube

Tags

Machine Learning Courses Data Science Courses Research Ethics Courses Logistic Regression Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the critical issue of data leakage and reproducibility in machine learning-based science through this insightful 48-minute talk. Delve into a comprehensive investigation of reproducibility failures across 17 scientific fields, affecting 329 papers and leading to overly optimistic conclusions. Examine a detailed taxonomy of 8 types of leakage, ranging from basic errors to complex research challenges. Learn about proposed methodological changes, including model info sheets, to prevent leakage before publication. Discover the results of a reproducibility study in civil war prediction, revealing how complex ML models fail to outperform older statistical methods due to data leakage. Gain valuable insights from Sayash Kapoor, a Ph.D. candidate at Princeton University, whose research on ML methods in science has garnered recognition and been featured in prominent media outlets.

Syllabus

DSI | Leakage and the Reproducibility Crisis in ML-based Science


Taught by

Inside Livermore Lab

Related Courses

Statistical Learning with R
Stanford University via edX
The Analytics Edge
Massachusetts Institute of Technology via edX
Regression Models
Johns Hopkins University via Coursera
Introduction à la statistique avec R
Université Paris SUD via France Université Numerique
Statistical Reasoning for Public Health 2: Regression Methods
Johns Hopkins University via Coursera