Gatekeeping Data Quality with Apache Iceberg, Toree, and Airflow
Offered By: The ASF via YouTube
Course Description
Overview
Explore the critical role of data quality in data engineering and learn how to build efficient, insightful data pipelines at scale. Dive into a comprehensive overview of leveraging Apache Iceberg as a scalable table format with ACID guarantees, harnessing Apache Toree's interactive computation capabilities, and orchestrating automated data workflows using Apache Airflow. Discover how Iceberg's column-level statistics stored in metadata can be utilized for efficient and reliable data quality validation. Examine a practical example using a Jupyter notebook with Apache Toree to customize data audit and analysis steps before publication. Observe a demonstration of Apache Airflow's integration of sensors and operators to orchestrate workflows at scale. Gain valuable insights and tools for proactive data quality assurance, backed by the power and flexibility of open-source Apache projects, equipping you with the knowledge to implement robust data quality practices in your own data engineering projects.
Syllabus
Gatekeep iceberg data quality with Apache Toree and airflow
Taught by
The ASF
Related Courses
Metadata: Organizing and Discovering InformationThe University of North Carolina at Chapel Hill via Coursera Gérer les documents numériques : maîtriser les risques
CNAM via France Université Numerique Research Data Management and Sharing
The University of North Carolina at Chapel Hill via Coursera SharePoint Enterprise Content Management
Microsoft via edX Configuration Management on Google Cloud Platform
Google via Coursera