YoVDO

Iceberg's Best Secret: Exploring Metadata Tables

Offered By: The ASF via YouTube

Tags

Data Warehousing Courses Time Travel Courses Apache Iceberg Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore Iceberg's powerful metadata capabilities in this 38-minute conference talk from ApacheCon 2022. Dive into the "secret sauce" of Iceberg's rich metadata, which enables core features like time travel, query optimizations, and optimistic concurrency handling. Learn how to access and leverage system tables to gain valuable insights into your Iceberg data. Discover real-life queries for identifying recently updated partitions, investigating small file issues, and understanding data file filtering. Delve into advanced use cases such as data auditing and quality assessment, including tracking null value additions and data ingest latency. Gain practical tips for optimizing metadata table performance and stay updated on ongoing community improvements. Whether you're an experienced Iceberg user or just getting started, master this under-utilized feature to maximize your Iceberg implementation's potential.

Syllabus

Intro
What is Iceberg
Metadata files
Metadata tables
Partitions table
The newest table
Why are there so many tables
Partitions
Snapshots
Maintenance Operations
Expired Snapshots
Snapshots Summary
Optimize Metadata
Optimize Iceberg Data
Bonus
Data Quality
Puffin Files
Avro


Taught by

The ASF

Related Courses

Building Modern Data Streaming Apps with Open Source
Linux Foundation via YouTube
How to Stabilize a GenAI-First Modern Data LakeHouse - Provisioning 20,000 Ephemeral Data Lakes per Year
CNCF [Cloud Native Computing Foundation] via YouTube
Data Storage and Queries
DeepLearning.AI via Coursera
Delivering Portability to Open Data Lakes with Delta Lake UniForm
Databricks via YouTube
Fast Copy-On-Write in Apache Parquet for Data Lakehouse Upserts
Databricks via YouTube