YoVDO

Data Quality Approach for Netflix Personalization Systems

Offered By: Databricks via YouTube

Tags

Big Data Courses Data Visualization Courses Machine Learning Courses SQL Courses ETL Courses

Course Description

Overview

Explore an approach to data quality for Netflix personalization systems in this 28-minute conference talk from Databricks. Learn about the challenges of maintaining data quality for machine learning models trained on hundreds of terabytes of data daily. Discover infrastructure and methods used to ensure high data quality, including 'Swimlanes' for defining data boundaries, pipelines for aggregating metrics, visualization tools for observing changes over time, and automated audits for detecting data regressions. Gain insights into optimizing metric computations, using SQL queries for defining metrics, and leveraging Spark for ETL jobs powering visualization and audit tools.

Syllabus

Intro
The Sock Universe?
Historical Fact Store Components
Netflix Pl Historical Fact Store
Causes of Bad Data
Bad Data Example 1: Drift
Drastic changes
Under Utilization
Summarizing...
Preetam B Joshi
Example Data
Aggregations
Automated Monitoring
Distribution Checks - Algorithm
Distribution Checks - Statistical Test
Data Quality Architecture
Plain old visualizations
Debugging via Visualization
Swimlanes
Questions?


Taught by

Databricks

Related Courses

Introduction to Artificial Intelligence
Stanford University via Udacity
Natural Language Processing
Columbia University via Coursera
Probabilistic Graphical Models 1: Representation
Stanford University via Coursera
Computer Vision: The Fundamentals
University of California, Berkeley via Coursera
Learning from Data (Introductory Machine Learning course)
California Institute of Technology via Independent