YoVDO

Git for Data Lakes: How LakeFS Scales Data Versioning to Billions of Objects

Offered By: The ASF via YouTube

Tags

Data Lakes Courses Git Courses Apache Spark Courses Data Management Courses Scalability Courses LakeFS Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore how lakeFS scales Git-like data versioning to billions of objects in modern data lake architectures during this 43-minute conference talk from ApacheCon 2022. Learn about the challenges of using object storage for data lakes and discover how lakeFS introduces Git-inspired concepts such as branching, committing, merging, and rolling back changes to ensure data quality and resiliency. Understand the scalability of lakeFS's Git-like data model for petabyte-scale data across billions of objects without compromising throughput or performance. Witness a demonstration of branching, writing data using Spark, and merging on a billion-object repository. Gain insights into solving common data lake problems and enhancing data management practices for large-scale object storage systems.

Syllabus

Git for Data Lakes How lakeFS Scales data versioning to billions of objects Amit Kesarwani


Taught by

The ASF

Related Courses

CS115x: Advanced Apache Spark for Data Science and Data Engineering
University of California, Berkeley via edX
Big Data Analytics
University of Adelaide via edX
Big Data Essentials: HDFS, MapReduce and Spark RDD
Yandex via Coursera
Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames
Yandex via Coursera
Introduction to Apache Spark and AWS
University of London International Programmes via Coursera