YoVDO

Unsupervised Machine Learning for Scaling Data Quality Monitoring in Databricks

Offered By: Databricks via YouTube

Tags

Unsupervised Machine Learning Courses Data Visualization Courses Databricks Courses Anomaly Detection Courses Root Cause Analysis Courses

Course Description

Overview

Explore how unsupervised machine learning can revolutionize data quality monitoring in Databricks in this 37-minute conference talk. Delve into the limitations of traditional rules and metrics approaches, and discover a set of fully unsupervised machine learning algorithms designed to monitor data quality at scale. Learn about the algorithms' functionality, strengths, and weaknesses, as well as their testing and calibration processes. Gain insights into unsupervised data quality monitoring techniques, their advantages and challenges, and practical steps to implement them in Databricks. Examine real-world examples using ticket sales data, and understand how to set up monitoring in Anomalo. Investigate various visualizations, including severity, explanation, distribution, and root cause analysis. Explore the process of encoding features automatically, building supervised models, and generating visualizations using SHAP values. Address challenges in implementation and testing, and learn how to get started with these techniques in Databricks.

Syllabus

Intro
Data Quality in the Modern Data Stack
Three Approaches to Data Quality Monitoring
Ticket Sales Data
Setup Monitoring in Anomalo
Anomalo Monitoring
Chaos Library
Check Log
Visualizations: Severity & Explanation
Visualizations Distribution
Visualizations: Root Cause Analysis
Encode Features Automatically
Build a Supervised Model
Generate Visualizations Using SHAP Values
Challenges
Testing
Get Started in Databricks
DATA+AI SUMMIT 2022


Taught by

Databricks

Related Courses

Intro to Statistics
Stanford University via Udacity
Introduction to Data Science
University of Washington via Coursera
Passion Driven Statistics
Wesleyan University via Coursera
Information Visualization
Indiana University via Independent
DCO042 - Python For Informatics
University of Michigan via Independent