Iceberg's Best Secret: Exploring Metadata Tables
Offered By: The ASF via YouTube
Course Description
Overview
Explore Iceberg's powerful metadata capabilities in this 38-minute conference talk from ApacheCon 2022. Dive into the "secret sauce" of Iceberg's rich metadata, which enables core features like time travel, query optimizations, and optimistic concurrency handling. Learn how to access and leverage system tables to gain valuable insights into your Iceberg data. Discover real-life queries for identifying recently updated partitions, investigating small file issues, and understanding data file filtering. Delve into advanced use cases such as data auditing and quality assessment, including tracking null value additions and data ingest latency. Gain practical tips for optimizing metadata table performance and stay updated on ongoing community improvements. Whether you're an experienced Iceberg user or just getting started, master this under-utilized feature to maximize your Iceberg implementation's potential.
Syllabus
Intro
What is Iceberg
Metadata files
Metadata tables
Partitions table
The newest table
Why are there so many tables
Partitions
Snapshots
Maintenance Operations
Expired Snapshots
Snapshots Summary
Optimize Metadata
Optimize Iceberg Data
Bonus
Data Quality
Puffin Files
Avro
Taught by
The ASF
Related Courses
Building Modern Data Streaming Apps with Open SourceLinux Foundation via YouTube How to Stabilize a GenAI-First Modern Data LakeHouse - Provisioning 20,000 Ephemeral Data Lakes per Year
CNCF [Cloud Native Computing Foundation] via YouTube Data Storage and Queries
DeepLearning.AI via Coursera Delivering Portability to Open Data Lakes with Delta Lake UniForm
Databricks via YouTube Fast Copy-On-Write in Apache Parquet for Data Lakehouse Upserts
Databricks via YouTube