YoVDO

Fast Copy-On-Write in Apache Parquet for Data Lakehouse Upserts

Offered By: Databricks via YouTube

Tags

GDPR Courses Delta Lake Courses Apache Parquet Courses Apache Iceberg Courses Apache Hudi Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Discover a groundbreaking approach to efficient table ACID upserts in data lakehouses through this 35-minute conference talk. Learn about the implementation of partial copy-on-write within Parquet using row-level indexing to significantly improve upsert performance. Explore how this technique addresses critical use cases such as GDPR Right to be Forgotten and Change Data Capture, overcoming limitations in existing solutions like Apache Delta Lake, Iceberg, and Hudi. Understand the mechanics behind skipping unnecessary column chunks, resulting in up to 20x faster upserts compared to conventional methods. Gain insights from Mingmin Chen, Director of Engineering, and Xinli Shang, Engineering Manager at Uber Technologies, Inc., as they share their expertise on enhancing data lakehouse operations.

Syllabus

Fast Copy-On-Write in Apache Parquet for Data Lakehouse Upserts


Taught by

Databricks

Related Courses

Building Modern Data Streaming Apps with Open Source
Linux Foundation via YouTube
How to Stabilize a GenAI-First Modern Data LakeHouse - Provisioning 20,000 Ephemeral Data Lakes per Year
CNCF [Cloud Native Computing Foundation] via YouTube
Data Storage and Queries
DeepLearning.AI via Coursera
Delivering Portability to Open Data Lakes with Delta Lake UniForm
Databricks via YouTube
Capital One's Data Innovation Strategy - You Build, Your Data (YBYD)
Databricks via YouTube