YoVDO

Understanding and Improving Code Generation in Spark

Offered By: Databricks via YouTube

Tags

Code Generation Courses Memory Management Courses Performance Tuning Courses

Course Description

Overview

Explore the intricacies of code generation in Spark's physical execution engine in this 24-minute conference talk. Dive into the differences between expression codegen and whole-stage codegen, and learn how Workday has improved code generation to handle complex queries. Discover the challenges posed by large generated functions, including OOM errors, Java method size limitations, and performance regressions. Understand the innovative approach to splitting collapsed functions from whole-stage codegen while maintaining performance benefits. Gain insights into the performance improvements achieved in production workloads through these enhancements. Follow the journey from the Volcano Iterator Model to Stage Cogeneration, examining the problems encountered and the solutions implemented to optimize Spark's code generation capabilities.

Syllabus

Intro
Volcano Iterator Model
Stage Cogeneration
Problems
Solution


Taught by

Databricks

Related Courses

Compilers
Stanford University via Coursera
Build a Modern Computer from First Principles: Nand to Tetris Part II (project-centered course)
Hebrew University of Jerusalem via Coursera
Разработка веб-сервисов на Go - основы языка
Moscow Institute of Physics and Technology via Coursera
Complete Guide to Protocol Buffers 3 [Java, Golang, Python]
Udemy
Angular tooling: Generating code with schematics
Coursera Project Network via Coursera