YoVDO

Building Data Intensive Analytic Applications on Top of Delta Lakes

Offered By: Databricks via YouTube

Tags

Data Analysis Courses Apache Spark Courses Data Lakes Courses Data Engineering Courses Delta Lake Courses

Course Description

Overview

Explore the world of data reliability and performance in big data workloads through this 43-minute tutorial on building data-intensive analytic applications with Delta Lake. Learn how Delta Lake, an open-source storage layer, brings ACID transactions to Apache Spark™ and addresses key challenges faced by data engineers. Discover the requirements of modern data engineering and how Delta Lake can improve data reliability at scale. Through presentations, code examples, and interactive notebooks, gain insights into applying this innovation to your data architecture. Understand key data reliability challenges, how Delta Lake fits within an Apache Spark™ environment, and practical ways to implement data reliability improvements. Dive into topics such as data lakes, streaming, schema evolution, and merge operations while exploring hands-on examples using Delta Lake's features.

Syllabus

Introduction
Data Lakes
Typical Data Lake Project
Who uses Delta
Getting started
Data
Download Data
Park Table
Stop Streaming
Initializing Streaming
Working with Parker
Using Delta Lake
Streaming Job
Multiple Streaming Queries
Counting Continuously
Schema Evolution
Merged Schema
Summary
History
Vacuum
Mods
Merge
Update Data
Define DataFrame
Merge Syntax
Random Data
For Each Batch
Summarize
Community
Question
Thank you


Taught by

Databricks

Related Courses

Distributed Computing with Spark SQL
University of California, Davis via Coursera
Apache Spark (TM) SQL for Data Analysts
Databricks via Coursera
Building Your First ETL Pipeline Using Azure Databricks
Pluralsight
Implement a data lakehouse analytics solution with Azure Databricks
Microsoft via Microsoft Learn
Perform data science with Azure Databricks
Microsoft via Microsoft Learn