YoVDO

How to Automate Performance Tuning for Apache Spark

Offered By: Databricks via YouTube

Tags

Apache Spark Courses Performance Tuning Courses Scalability Courses Data Pipelines Courses

Course Description

Overview

Discover how to streamline and automate performance tuning for Apache Spark in this 41-minute conference talk by Jean Yves Stephan from Data Mechanics. Learn about the challenges of maintaining efficient and stable data pipelines in production, including selecting appropriate infrastructure, configuring Spark correctly, and ensuring scalability as data volumes grow. Explore the key information and parameters for manual tuning, and delve into various automation options, from open-source tools to managed services. Gain insights into common issues like lack of parallelism, shuffle spill, and data skew, and understand how to leverage node metrics for improvements. The talk covers the iterative nature of performance tuning, cost-speed trade-offs, and the architecture and algorithms behind automated tuning tools. By the end, you'll have a comprehensive understanding of how to optimize your Spark applications and meet SLAs efficiently, even as you scale to hundreds or thousands of jobs.

Syllabus

Intro
What is performance tuning?
Why automate performance tuning?
Perf tuning is an iterative process
Common issues: lack of parallelism
Common issues: shuffle spill
Improvements based on node metrics
Cost-speed trade-off
Recap: manual perf tuning
Open source tuning tools
Motivations
Architecture (tech)
Architecture (algo)
Heuristics example
Evaluator
Experiment manager
Data Mechanics platform
Common issues: data skew
Impact of automated tuning


Taught by

Databricks

Related Courses

Financial Sustainability: The Numbers side of Social Enterprise
+Acumen via NovoEd
Cloud Computing Concepts: Part 2
University of Illinois at Urbana-Champaign via Coursera
Developing Repeatable ModelsĀ® to Scale Your Impact
+Acumen via Independent
Managing Microsoft Windows Server Active Directory Domain Services
Microsoft via edX
Introduction aux conteneurs
Microsoft Virtual Academy via OpenClassrooms