The Killer Feature Store - Orchestrating Spark ML Pipelines and MLflow for Production

Offered By: Databricks via YouTube

Course Description

Overview

Explore the concept of feature stores in data architecture and their role in productionizing ML applications through this 25-minute conference talk. Learn about the challenges of managing data and deploying applications in experimental, data-driven research environments, particularly in production ML pipelines with interdependent modeling and featurization stages. Discover how to implement a feature store as an orchestration engine for ML pipeline stages using Spark and MLflow, going beyond the traditional role of a metadata repository. Gain insights into breaking down ML pipeline deployment, avoiding the 'clone and own' anti-pattern, and isolating pipeline orchestration concerns. Explore novel algorithms for pipeline stage orchestration, data models for feature stage metadata, and concrete system designs using open source tools. Understand the state of feature stores in industry through a survey of reference architectures, open source repositories, and client experiences. Walk away with practical system designs and innovative algorithms to inspire your own feature store implementation.

Syllabus

Introduction
Common Problem
Whats the effort
Semantics
Machine Learning Example
Customer Segmentation Example
Trade Test Split Example
Feature Management
Automation
ML Pipeline
Pipeline Overview
Why does it exist
Pipeline deployment
Pipeline stage declaration
Pipeline construction
Vectorizing text
Demo
ML pipeline orchestration API

Taught by

Databricks

The Killer Feature Store - Orchestrating Spark ML Pipelines and MLflow for Production

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

The Killer Feature Store - Orchestrating Spark ML Pipelines and MLflow for Production

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Login to Continue