Apache XTable - Interoperability Among Lakehouse Table Formats
Offered By: Databricks via YouTube
Course Description
Overview
Explore the world of lakehouse table formats in this 36-minute conference talk presented by Dipankar Mazumdar and Kyle Weller from Onehouse. Dive into the challenges of choosing between leading open source projects like Apache Hudi, Delta Lake, and Iceberg, each offering unique features for decoupled storage with transaction and metadata layer primitives. Learn about XTable, an innovative open-source project providing omnidirectional interoperability between table formats without introducing a new format. Discover how XTable's metadata translation abstractions enable writing data in any format and converting it to targets consumable by different compute engines, addressing the challenge of format selection and interoperability in lakehouse workloads. Gain insights into the storage of data in open columnar formats like Parquet, along with metadata for schema, commit history, partitions, and column stats. After the talk, explore additional resources on data lakehouse concepts and fundamentals to deepen your understanding of this evolving field.
Syllabus
Apache XTable (incubating): Interoperability Among Lakehouse Table Formats
Taught by
Databricks
Related Courses
Python for Data Science Tips, Tricks, & TechniquesLinkedIn Learning Sound Data Engineering in Rust - From Bits to DataFrames
Databricks via YouTube Recent Parquet Improvements in Apache Spark - Vectorized Complex Types and Column Index Support
Databricks via YouTube Optimizing Spark SQL Jobs with Parallel and Asynchronous IO
Databricks via YouTube Degrading Performance - Understanding and Solving Small Files Syndrome
Databricks via YouTube