Assuring Data Quality at Scale
Offered By: Devoxx via YouTube
Course Description
Overview
Explore the critical importance of data quality in AI-driven enterprises through this 50-minute Devoxx conference talk. Delve into the challenges of maintaining high-quality data in modern data ecosystems, examining both stream and batch processing perspectives. Learn about key dimensions and metrics of data quality, and discover an approach to implementing a scalable data quality platform. Gain insights into providing near real-time visibility of data quality issues, fitting this capability into existing data ecosystems, and triggering remediation actions. Understand the impact of data quality on ML model outputs, accuracy, relevance, and data engineering pipeline costs. Examine real-world examples of data quality issues and their consequences, and explore the concept of DataMesh for building decentralized data products.
Syllabus
Introduction
About Gayathri
Pipelines
What is Data Quality
What is DataDriven Organization
Good Quality Data
Data Quality Issues
Real World Examples
Incomplete Data
Incorrect Data
Bad Customer Experience
Data Loss
Data Quality
completeness
accuracy
timeliness
subjectivity
data mesh
data matching
changing data landscape
four key principles
ownership of data quality
data quality is a huge space
data quality capabilities
what makes an effective data quality monitoring
offerings
homegrown options
centralized platform approach
platform approach
connectors
data infrastructure
profiling
checks
alerts
transparency
challenges
conclusion
References
Questions
Taught by
Devoxx
Related Courses
Introduction to Windows PowerShellMicrosoft via edX Windows PowerShell Basics
Microsoft via edX Preparing for Google Cloud Certification: Cloud Data Engineer
Google Cloud via Coursera Data Engineering on Google Cloud Platform en Français
Google Cloud via Coursera Data Engineering on Google Cloud Platform en Español
Google Cloud via Coursera