YoVDO

Git for Data Lakes: How LakeFS Scales Data Versioning to Billions of Objects

Offered By: The ASF via YouTube

Tags

Data Lakes Courses Git Courses Apache Spark Courses Data Management Courses Scalability Courses LakeFS Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore how lakeFS scales Git-like data versioning to billions of objects in modern data lake architectures during this 43-minute conference talk from ApacheCon 2022. Learn about the challenges of using object storage for data lakes and discover how lakeFS introduces Git-inspired concepts such as branching, committing, merging, and rolling back changes to ensure data quality and resiliency. Understand the scalability of lakeFS's Git-like data model for petabyte-scale data across billions of objects without compromising throughput or performance. Witness a demonstration of branching, writing data using Spark, and merging on a billion-object repository. Gain insights into solving common data lake problems and enhancing data management practices for large-scale object storage systems.

Syllabus

Git for Data Lakes How lakeFS Scales data versioning to billions of objects Amit Kesarwani


Taught by

The ASF

Related Courses

Multi-Table Transactions with LakeFS and Delta Lake - Tech Talk
Databricks via YouTube
CI/CD for Data - Building Dev/Test Data Environments with Open Source Stacks
CNCF [Cloud Native Computing Foundation] via YouTube
Building Reproducible ML Processes with an Open Source Stack
Linux Foundation via YouTube
Power Up Your Lakehouse with Git Semantics and Delta Lake
Databricks via YouTube
Version Control for Lakehouse Architecture - Essential Practices and Benefits
Databricks via YouTube