Building Data Intensive Analytic Applications on Top of Delta Lakes
Offered By: Databricks via YouTube
Course Description
Overview
Explore the world of data reliability and performance in big data workloads through this 43-minute tutorial on building data-intensive analytic applications with Delta Lake. Learn how Delta Lake, an open-source storage layer, brings ACID transactions to Apache Spark™ and addresses key challenges faced by data engineers. Discover the requirements of modern data engineering and how Delta Lake can improve data reliability at scale. Through presentations, code examples, and interactive notebooks, gain insights into applying this innovation to your data architecture. Understand key data reliability challenges, how Delta Lake fits within an Apache Spark™ environment, and practical ways to implement data reliability improvements. Dive into topics such as data lakes, streaming, schema evolution, and merge operations while exploring hands-on examples using Delta Lake's features.
Syllabus
Introduction
Data Lakes
Typical Data Lake Project
Who uses Delta
Getting started
Data
Download Data
Park Table
Stop Streaming
Initializing Streaming
Working with Parker
Using Delta Lake
Streaming Job
Multiple Streaming Queries
Counting Continuously
Schema Evolution
Merged Schema
Summary
History
Vacuum
Mods
Merge
Update Data
Define DataFrame
Merge Syntax
Random Data
For Each Batch
Summarize
Community
Question
Thank you
Taught by
Databricks
Related Courses
Social Network AnalysisUniversity of Michigan via Coursera Intro to Algorithms
Udacity Data Analysis
Johns Hopkins University via Coursera Computing for Data Analysis
Johns Hopkins University via Coursera Health in Numbers: Quantitative Methods in Clinical & Public Health Research
Harvard University via edX