Git for Data Lakes: How LakeFS Scales Data Versioning to Billions of Objects
Offered By: The ASF via YouTube
Course Description
Overview
Explore how lakeFS scales Git-like data versioning to billions of objects in modern data lake architectures during this 43-minute conference talk from ApacheCon 2022. Learn about the challenges of using object storage for data lakes and discover how lakeFS introduces Git-inspired concepts such as branching, committing, merging, and rolling back changes to ensure data quality and resiliency. Understand the scalability of lakeFS's Git-like data model for petabyte-scale data across billions of objects without compromising throughput or performance. Witness a demonstration of branching, writing data using Spark, and merging on a billion-object repository. Gain insights into solving common data lake problems and enhancing data management practices for large-scale object storage systems.
Syllabus
Git for Data Lakes How lakeFS Scales data versioning to billions of objects Amit Kesarwani
Taught by
The ASF
Related Courses
Data Lakes for Big DataEdCast Distributed Computing with Spark SQL
University of California, Davis via Coursera Modernizing Data Lakes and Data Warehouses with Google Cloud
Google Cloud via Coursera Data Engineering with AWS
Udacity Preparing for Google Cloud Certification: Cloud Data Engineer
Google Cloud via Coursera