Building Reproducible ML Processes with an Open Source Stack
Offered By: Linux Foundation via YouTube
Course Description
Overview
Explore the essential components for creating reproducible machine learning experiments in this 33-minute conference talk. Learn how to combine Code (KubeFlow and Git), Data (Minio+lakeFS), and Environment (Infrastructure-as-code) to ensure true reproducibility. Witness a hands-on demonstration of reproducing an experiment while maintaining the exact input data, code, and processing environment from a previous run. Discover programmatic methods to integrate all aspects, including creating commits for data snapshots, tagging, and traversing the history of both code and data simultaneously. Gain insights into overcoming the limitations of MLFlow Projects in ensuring data reproducibility for comprehensive machine learning processes.
Syllabus
Building Reproducible ML Processes with an Open Source Stack - Einat Orr, Treeverse
Taught by
Linux Foundation
Tags
Related Courses
Object Storage Driven Machine Learning WorkloadsLinux Foundation via YouTube Writing Machine Learning Pipelines Against Object Storage
Linux Foundation via YouTube Introduction to KubeFlow: Using and Use Cases
Linux Foundation via YouTube Building a Cloud Native Storage Service - Dropbox Example
Linux Foundation via YouTube The Fallacies of Distributed Computing
Gopher Academy via YouTube