YoVDO

Modern ETL Pipelines with Change Data Capture - Building Resilient Data Streams

Offered By: Databricks via YouTube

Tags

Data Engineering Courses Apache Spark Courses Apache Airflow Courses Data Lakes Courses Data Streaming Courses ETL Pipelines Courses Debezium Courses

Course Description

Overview

Explore the development of a modern ETL pipeline using Debezium, Kafka, Spark, and Airflow in this 43-minute conference talk. Learn how GetYourGuide transformed their error-prone legacy system into a robust, schema-change-resilient pipeline capable of multiple daily data lake refreshes. Discover the architecture and implementation steps for building a Change Data Capture layer that streams database changes directly to Kafka. Gain insights into reducing operational time with Databricks and understand the benefits of fresh data for business users. The talk covers the extraction layer, schema service, data landscape, dependency management, transformation layer components, and the importance of testing. Explore special syntax elements, the small file problem, and data warehouse integration. Conclude with a Q&A session addressing how the new pipeline works and its read-write capabilities.

Syllabus

Introduction
Agenda
About GetYourGuide
Introduction to GetYourGuide
Legacy Pipelines
Introducing Riverless
Extraction Layer
DB Zoom
Schema Service
Converter
Abject
Data Landscape
Dependency Management
Transformation Layer Components
Special Syntax Elements
Importance of Testing
Dependencies
Benefits
Next Steps
Questions
How it works
ReadWrite
Small File Problem
Data Warehouse
Question


Taught by

Databricks

Related Courses

Keep Your Cache Always Fresh with Debezium
Devoxx via YouTube
ElasticSearch, MongoDB and Neo4j Walk Into a Bar - A Tale of Different Databases
Devoxx via YouTube
Data Streaming for Microservices Using Debezium
Devoxx via YouTube
Streaming Database Changes with Debezium
Devoxx via YouTube
Embracing Database Diversity with Kafka and Debezium
Devoxx via YouTube