YoVDO

Demystifying Delta Lake - Data Reliability for Data Lakes

Offered By: Databricks via YouTube

Tags

Delta Lake Courses Apache Spark Courses Data Lakes Courses Batch Data Processing Courses Streaming Data Processing Courses

Course Description

Overview

Explore the intricacies of Delta Lake in this informative interview with Michael Armbrust, the original creator of Spark SQL and a key figure in Apache Spark development. Dive into the world of reliable data lakes as Armbrust explains how Delta Lake brings ACID transactions, scalable metadata handling, and unified streaming and batch data processing to existing data lakes. Learn about the compatibility with Apache Spark APIs and discover the reasons behind Delta Lake's creation. Gain insights into making streaming first-class, automatic schema migration, multi-version concurrency control, and troubleshooting slow queries. Understand the differences between vacuum and optimize operations, explore GDPR and COPPA compliance, and compare Z-Ordering with partitioning. Get a glimpse of the Delta Lake roadmap and its future developments in this 26-minute episode of Data Brew, a series that offers straight-talking discussions on Data + AI evolution.

Syllabus

Intro
Welcome
Why Delta Lake
Making Streaming First Class
Why isnt Delta built into Spark
Delta roadmap
Why Delta
Challenges
Automatic Schema Migration
Why is Delta important
Multiversion concurrency control
Listing files
Troubleshooting slow queries
Auto optimize
Vacuum vs Optimize
GDPRCOPPA Compliance
Z Ordering vs Partitioning
Delta Lake Roadmap


Taught by

Databricks

Related Courses

CS115x: Advanced Apache Spark for Data Science and Data Engineering
University of California, Berkeley via edX
Big Data Analytics
University of Adelaide via edX
Big Data Essentials: HDFS, MapReduce and Spark RDD
Yandex via Coursera
Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames
Yandex via Coursera
Introduction to Apache Spark and AWS
University of London International Programmes via Coursera