Git for Data Lakes: How LakeFS Scales Data Versioning to Billions of Objects
Offered By: The ASF via YouTube
Course Description
Overview
Explore how lakeFS scales Git-like data versioning to billions of objects in modern data lake architectures during this 43-minute conference talk from ApacheCon 2022. Learn about the challenges of using object storage for data lakes and discover how lakeFS introduces Git-inspired concepts such as branching, committing, merging, and rolling back changes to ensure data quality and resiliency. Understand the scalability of lakeFS's Git-like data model for petabyte-scale data across billions of objects without compromising throughput or performance. Witness a demonstration of branching, writing data using Spark, and merging on a billion-object repository. Gain insights into solving common data lake problems and enhancing data management practices for large-scale object storage systems.
Syllabus
Git for Data Lakes How lakeFS Scales data versioning to billions of objects Amit Kesarwani
Taught by
The ASF
Related Courses
Multi-Table Transactions with LakeFS and Delta Lake - Tech TalkDatabricks via YouTube CI/CD for Data - Building Dev/Test Data Environments with Open Source Stacks
CNCF [Cloud Native Computing Foundation] via YouTube Building Reproducible ML Processes with an Open Source Stack
Linux Foundation via YouTube Power Up Your Lakehouse with Git Semantics and Delta Lake
Databricks via YouTube Version Control for Lakehouse Architecture - Essential Practices and Benefits
Databricks via YouTube