YoVDO

Adaptive Query Execution: Speeding Up Spark SQL at Runtime

Offered By: Databricks via YouTube

Tags

Apache Spark Courses Data Analytics Courses Distributed Computing Courses

Course Description

Overview

Explore the Adaptive Query Execution framework introduced in Spark 3.0 through this 46-minute Databricks conference talk. Dive into how this new feature tackles performance challenges by re-optimizing and adjusting query plans based on runtime statistics. Learn about statistics-guided optimizations like partition coalescing and dynamic join strategy selection, and see their impact through practical query examples. Understand how these improvements address issues with outdated data statistics and inaccurate cardinality estimates in Spark SQL. Witness the significant performance gains achieved on the TPC-DS benchmark using Adaptive Query Execution, and gain insights into how this framework can speed up Spark SQL queries at runtime.

Syllabus

Intro
Agenda
Adaptive Query Execution
Optimizations Overview
Partition Coalescing
Dynamic Join Strategy Selection
Importing EQE
Sales Table
Dynamically collapsing shuffle partitions
Demo of collapsing shuffle partitions
Demo of dynamically optimizing the query
Performance result
Dynamicly collapsing shuffle partitions
Dynamically switching joint strategies


Taught by

Databricks

Related Courses

CS115x: Advanced Apache Spark for Data Science and Data Engineering
University of California, Berkeley via edX
Big Data Analytics
University of Adelaide via edX
Big Data Essentials: HDFS, MapReduce and Spark RDD
Yandex via Coursera
Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames
Yandex via Coursera
Introduction to Apache Spark and AWS
University of London International Programmes via Coursera