YoVDO

Gatekeeping Data Quality with Apache Iceberg, Toree, and Airflow

Offered By: The ASF via YouTube

Tags

Apache Airflow Courses Jupyter Notebooks Courses Data Engineering Courses Metadata Courses Data Validation Courses Data Pipelines Courses Apache Iceberg Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the critical role of data quality in data engineering and learn how to build efficient, insightful data pipelines at scale. Dive into a comprehensive overview of leveraging Apache Iceberg as a scalable table format with ACID guarantees, harnessing Apache Toree's interactive computation capabilities, and orchestrating automated data workflows using Apache Airflow. Discover how Iceberg's column-level statistics stored in metadata can be utilized for efficient and reliable data quality validation. Examine a practical example using a Jupyter notebook with Apache Toree to customize data audit and analysis steps before publication. Observe a demonstration of Apache Airflow's integration of sensors and operators to orchestrate workflows at scale. Gain valuable insights and tools for proactive data quality assurance, backed by the power and flexibility of open-source Apache projects, equipping you with the knowledge to implement robust data quality practices in your own data engineering projects.

Syllabus

Gatekeep iceberg data quality with Apache Toree and airflow


Taught by

The ASF

Related Courses

Building Modern Data Streaming Apps with Open Source
Linux Foundation via YouTube
How to Stabilize a GenAI-First Modern Data LakeHouse - Provisioning 20,000 Ephemeral Data Lakes per Year
CNCF [Cloud Native Computing Foundation] via YouTube
Data Storage and Queries
DeepLearning.AI via Coursera
Delivering Portability to Open Data Lakes with Delta Lake UniForm
Databricks via YouTube
Fast Copy-On-Write in Apache Parquet for Data Lakehouse Upserts
Databricks via YouTube