The Killer Feature Store - Orchestrating Spark ML Pipelines and MLflow for Production
Offered By: Databricks via YouTube
Course Description
Overview
Explore the concept of feature stores in data architecture and their role in productionizing ML applications through this 25-minute conference talk. Learn about the challenges of managing data and deploying applications in experimental, data-driven research environments, particularly in production ML pipelines with interdependent modeling and featurization stages. Discover how to implement a feature store as an orchestration engine for ML pipeline stages using Spark and MLflow, going beyond the traditional role of a metadata repository. Gain insights into breaking down ML pipeline deployment, avoiding the 'clone and own' anti-pattern, and isolating pipeline orchestration concerns. Explore novel algorithms for pipeline stage orchestration, data models for feature stage metadata, and concrete system designs using open source tools. Understand the state of feature stores in industry through a survey of reference architectures, open source repositories, and client experiences. Walk away with practical system designs and innovative algorithms to inspire your own feature store implementation.
Syllabus
Introduction
Common Problem
Whats the effort
Semantics
Machine Learning Example
Customer Segmentation Example
Trade Test Split Example
Feature Management
Automation
ML Pipeline
Pipeline Overview
Why does it exist
Pipeline deployment
Pipeline stage declaration
Pipeline construction
Vectorizing text
Demo
ML pipeline orchestration API
Taught by
Databricks
Related Courses
First Nights - Berlioz’s Symphonie Fantastique and Program Music in the 19th CenturyHarvard University via edX Azure Application Deployment and Management
Microsoft via edX Building Modern Nodejs Applications on AWS
Amazon Web Services via edX Implementation Strategies: Cloud Computing
The University of British Columbia via edX Introducción a Contenedores con Docker y Kubernetes
IBM via Coursera