YoVDO

Git for Data Lakes: How LakeFS Scales Data Versioning to Billions of Objects

Offered By: The ASF via YouTube

Tags

Data Lakes Courses Git Courses Apache Spark Courses Data Management Courses Scalability Courses LakeFS Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore how lakeFS scales Git-like data versioning to billions of objects in modern data lake architectures during this 43-minute conference talk from ApacheCon 2022. Learn about the challenges of using object storage for data lakes and discover how lakeFS introduces Git-inspired concepts such as branching, committing, merging, and rolling back changes to ensure data quality and resiliency. Understand the scalability of lakeFS's Git-like data model for petabyte-scale data across billions of objects without compromising throughput or performance. Witness a demonstration of branching, writing data using Spark, and merging on a billion-object repository. Gain insights into solving common data lake problems and enhancing data management practices for large-scale object storage systems.

Syllabus

Git for Data Lakes How lakeFS Scales data versioning to billions of objects Amit Kesarwani


Taught by

The ASF

Related Courses

Financial Sustainability: The Numbers side of Social Enterprise
+Acumen via NovoEd
Cloud Computing Concepts: Part 2
University of Illinois at Urbana-Champaign via Coursera
Developing Repeatable ModelsĀ® to Scale Your Impact
+Acumen via Independent
Managing Microsoft Windows Server Active Directory Domain Services
Microsoft via edX
Introduction aux conteneurs
Microsoft Virtual Academy via OpenClassrooms