YoVDO

Scaling Data and ML with Apache Spark and Feast - Feature Engineering for Production

Offered By: Databricks via YouTube

Tags

Apache Spark Courses Big Data Courses Machine Learning Courses Feature Engineering Courses Data Ingestion Courses MLFlow Courses FEAST Courses

Course Description

Overview

Explore how Gojek, Indonesia's first billion-dollar startup, leverages big data and machine learning to power decision-making across its diverse product offerings in this 38-minute talk. Discover the challenges in feature engineering for large-scale ML systems and learn how Feast, an open-source feature store built on Apache Spark and MLflow, addresses these issues. Gain insights into the impact of democratizing feature creation, sharing, and management on time-to-market and innovation. Examine the machine learning lifecycle before and after implementing Feast, understanding its role in overcoming data scaling and feature serving challenges. Delve into practical aspects of using Feast, including creating entities, ingesting data, ensuring point-in-time correctness, and validating features. Conclude with a look at the value Feast unlocks for organizations and its future roadmap.

Syllabus

Intro
Machine learning at Gojek
Machine learning life cycle prior to Feast
Problems with end-to-end ML systems
Feast background
Machine learning life cycle with Feast
What is Feast?
What is Feast not?
Create entities and features using feature sets
Ingesting a DataFrame into Feast
Ingesting streams into Feast
What happens to the data?
Feature references and retrieval
Events throughout time
Ensuring point-in-time correctness
Point-in-time joins
Getting features for model training
Getting features during online serving
Feature validation in Feast
Infer TFDV schemas for features
Visualize and validate training dataset
What value does Feast unlock?
Roadmap


Taught by

Databricks

Related Courses

CS115x: Advanced Apache Spark for Data Science and Data Engineering
University of California, Berkeley via edX
Big Data Analytics
University of Adelaide via edX
Big Data Essentials: HDFS, MapReduce and Spark RDD
Yandex via Coursera
Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames
Yandex via Coursera
Introduction to Apache Spark and AWS
University of London International Programmes via Coursera