YoVDO

Improving Broadcast Joins in Apache Spark SQL

Offered By: Databricks via YouTube

Tags

Memory Management Courses ETL Pipelines Courses

Course Description

Overview

Explore the intricacies of broadcast joins in Apache Spark SQL through this 28-minute Databricks conference talk. Delve into the mechanics of Spark's execution engine, focusing on broadcast joins and their performance implications. Learn about Workday's improvements to increase the threshold for effective broadcast joins, including executor-side broadcasting and modifications to Spark's whole-stage code generator. Discover techniques for limiting memory usage in executors while increasing broadcasting thresholds. Gain insights from real-world production case studies involving large-scale ETL pipelines. Acquire valuable knowledge to optimize your own Spark workloads and enhance your understanding of Spark's join infrastructure.

Syllabus

Intro
How Spark Works
What is Broadcast Join
How Broadcast Joins Work
Improving Broadcast Joins
Single Joint
Executors
Results
Production case study
Conclusion


Taught by

Databricks

Related Courses

Heterogeneous Parallel Programming
University of Illinois at Urbana-Champaign via Coursera
Advanced Operating Systems
Georgia Institute of Technology via Udacity
計算機程式設計 (Computer Programming)
National Taiwan University via Coursera
Introduction to Operating Systems
Georgia Institute of Technology via Udacity
Android Performance
Google via Udacity