YoVDO

Care and Feeding of Catalyst Optimizer - Practical Troubleshooting for Spark SQL

Offered By: Databricks via YouTube

Tags

Apache Spark Courses Performance Tuning Courses Spark SQL Courses

Course Description

Overview

Explore the intricacies of Spark's Catalyst query optimizer in this 42-minute talk from Databricks. Delve into practical challenges and edge cases that arise when working with Spark SQL, focusing on diagnosing and solving issues that require a deep understanding of Spark internals. Learn how to handle scenarios where UDFs unexpectedly become expensive, causing skew. Discover techniques for addressing codegen stage issues that exceed 64k method limits. Gain insights into improving Spark application performance by tuning the JVM code cache. Through a series of puzzles and real-world examples, develop a more sophisticated understanding of Spark's Catalyst Optimizer and enhance your ability to troubleshoot and optimize complex queries.

Syllabus

Introduction
Overview
What is Catalyst Optimizer
Case Study Groundhog Day
The Problem
Tuning Code Cache
Debugging Code
Metrics
Conclusion


Taught by

Databricks

Related Courses

CS115x: Advanced Apache Spark for Data Science and Data Engineering
University of California, Berkeley via edX
Big Data Analytics
University of Adelaide via edX
Big Data Essentials: HDFS, MapReduce and Spark RDD
Yandex via Coursera
Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames
Yandex via Coursera
Introduction to Apache Spark and AWS
University of London International Programmes via Coursera