Assuring Data Quality at Scale
Offered By: Devoxx via YouTube
Course Description
Overview
Explore the critical importance of data quality in AI-driven enterprises through this 50-minute Devoxx conference talk. Delve into the challenges of maintaining high-quality data in modern data ecosystems, examining both stream and batch processing perspectives. Learn about key dimensions and metrics of data quality, and discover an approach to implementing a scalable data quality platform. Gain insights into providing near real-time visibility of data quality issues, fitting this capability into existing data ecosystems, and triggering remediation actions. Understand the impact of data quality on ML model outputs, accuracy, relevance, and data engineering pipeline costs. Examine real-world examples of data quality issues and their consequences, and explore the concept of DataMesh for building decentralized data products.
Syllabus
Introduction
About Gayathri
Pipelines
What is Data Quality
What is DataDriven Organization
Good Quality Data
Data Quality Issues
Real World Examples
Incomplete Data
Incorrect Data
Bad Customer Experience
Data Loss
Data Quality
completeness
accuracy
timeliness
subjectivity
data mesh
data matching
changing data landscape
four key principles
ownership of data quality
data quality is a huge space
data quality capabilities
what makes an effective data quality monitoring
offerings
homegrown options
centralized platform approach
platform approach
connectors
data infrastructure
profiling
checks
alerts
transparency
challenges
conclusion
References
Questions
Taught by
Devoxx
Related Courses
Play by Play: Developing Microservices and Mobile Apps with JHipsterPluralsight Software Archaeology - Learning from the Landing on the Moon
Devoxx via YouTube Create an Eco-Friendly World with Green Software Engineering
Devoxx via YouTube Platform Building for Data Mesh - Show Me How It Is Done
Devoxx via YouTube The Hitchhiker's Guide to Software Architecture and Design
Devoxx via YouTube