Efficient, Low Latency Ingestion to Large Tables via Apache Flink and Apache Iceberg
Offered By: The ASF via YouTube
Course Description
Overview
Explore the challenges and solutions for efficient, low-latency data ingestion to large tables using Apache Flink and Apache Iceberg in this 24-minute conference talk. Learn about the tradeoffs between data availability latency and optimization for efficient reading, and discover how the integration of these two Apache projects addresses these challenges. Examine the ongoing projects aimed at balancing frequent commits with optimal file management, including balanced writes and periodic compaction. Gain insights into the development process, coordination between Apache communities, and implementation details. Compare this approach with alternative solutions like Apache Hudi and Apache Paimon, understanding their pros and cons. Witness a brief demo showcasing the possibilities of this integration, presented by Marton Balassi, a Flink PMC member and Engineering Manager at Apple, and Peter Vary, an Apache Iceberg committer and Staff Engineer at Apple.
Syllabus
Efficient, Low Latency Ingestion to Large Tables via Apache Flink and Apache Iceberg
Taught by
The ASF
Related Courses
Building Modern Data Streaming Apps with Open SourceLinux Foundation via YouTube How to Stabilize a GenAI-First Modern Data LakeHouse - Provisioning 20,000 Ephemeral Data Lakes per Year
CNCF [Cloud Native Computing Foundation] via YouTube Data Storage and Queries
DeepLearning.AI via Coursera Delivering Portability to Open Data Lakes with Delta Lake UniForm
Databricks via YouTube Fast Copy-On-Write in Apache Parquet for Data Lakehouse Upserts
Databricks via YouTube