YoVDO

Git for Data Lakes: How LakeFS Scales Data Versioning to Billions of Objects

Offered By: The ASF via YouTube

Tags

Data Lakes Courses Git Courses Apache Spark Courses Data Management Courses Scalability Courses LakeFS Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore how lakeFS scales Git-like data versioning to billions of objects in modern data lake architectures during this 43-minute conference talk from ApacheCon 2022. Learn about the challenges of using object storage for data lakes and discover how lakeFS introduces Git-inspired concepts such as branching, committing, merging, and rolling back changes to ensure data quality and resiliency. Understand the scalability of lakeFS's Git-like data model for petabyte-scale data across billions of objects without compromising throughput or performance. Witness a demonstration of branching, writing data using Spark, and merging on a billion-object repository. Gain insights into solving common data lake problems and enhancing data management practices for large-scale object storage systems.

Syllabus

Git for Data Lakes How lakeFS Scales data versioning to billions of objects Amit Kesarwani


Taught by

The ASF

Related Courses

Données et services numériques, dans le nuage et ailleurs
Certificat informatique et internet via France Université Numerique
Introduction to Digital Curation
University College London via Independent
Excel Avanzado
Miríadax
SAP Business Warehouse powered by SAP HANA
SAP Learning
Programming Mobile Applications for Android Handheld Systems: Part 2
University of Maryland, College Park via Coursera