YoVDO

How Apache Spark 3.0 and Delta Lake Enhance Data Lake Reliability

Offered By: Databricks via YouTube

Tags

Apache Spark Courses Time Travel Courses Delta Lake Courses

Course Description

Overview

Discover the latest advancements in big data processing during this Seattle Spark + AI Meetup video. Learn about performance improvements in Apache Spark 3.0, including Adaptive Query Execution (AQE), Dynamic Partition Pruning (DPP), and handling skewed queries. Explore how Delta Lake enhances data lake reliability with ACID transactions, Schema Enforcement, and Time Travel. Gain insights into the new AQE framework's query performance gains, with examples from a 3TB TPC-DS benchmark. Understand how DPP speeds up performance by pruning partitions in star schema designs. Delve into topics such as Spark Catalyst Optimizer, logical and physical planning, broadcast hash joins, and coalescing. Examine the traditional data warehousing problem and learn about split partitioning. Discover the Data Lake Reliability features, including Catalog APIs, SQL statement support, and partial rights. Explore the Data Quality Framework and improved performance in Delta Lake. This comprehensive presentation covers essential aspects of Apache Spark 3.0 and Delta Lake, providing valuable knowledge for big data professionals and enthusiasts.

Syllabus

Introduction
Who is Danny
Free Download
Databricks
Download the book
Adaptive Query Execution
Apache Spark 30
Performance
Spark Catalyst Optimizer
Logical Physical Planning
Aqe Fundamentals
Broadcast Hash Joins
Why not always broadcast join
Dynamically switch join strategies
Flipping the switch
Off script partitioning
Coalescence
Table Size
Coalescing
Traditional Data Warehousing Problem
Split Partitioning
QA Questions
Dynamic Partition Pruning
Dynamic Partition Pruning Before Optimization
Filter Scan
Results
Pseudo Rush
Building Ecosystem
Data Lake Reliability
Catalog API
SQL Statement Support
Partial Rights
Delete
Delete from Events
History Retention
Data Source v2 Catalog API
Data Quality Framework
Improved Performance
More About Delta


Taught by

Databricks

Related Courses

Big Data Essentials
A Cloud Guru
Big Data
University of Adelaide via edX
Advanced Data Science with IBM
IBM via Coursera
Amazon EMR Getting Started (Indonesian)
Amazon Web Services via AWS Skill Builder
Analisar e preparar dados com o Amazon SageMaker Data Wrangler e o Amazon EMR (Português (Brasil)) | Lab - Analyze and Prepare Data with Amazon SageMaker Data Wrangler and Amazon EMR (Portuguese (Brazil))
Amazon Web Services via AWS Skill Builder