YoVDO

Fine-Tuning and Enhancing Performance of Apache Spark Jobs

Offered By: Databricks via YouTube

Tags

Apache Spark Courses Garbage Collection Courses Serialization Courses

Course Description

Overview

Dive into best practices for fine-tuning and enhancing Apache Spark job performance in this 25-minute video from Databricks. Explore real-world problem-solving techniques and learn how to optimize resources by adjusting parameters such as garbage collector selection, serialization, worker/executor numbers, data partitioning, and Java heap settings. Analyze Spark UI execution DAGs to identify bottlenecks, optimize joins, and manage partition sizes. Discover strategies for handling data skew, utilizing scheduling pools, and implementing fair scheduler. Gain insights into Spark SQL rollup best practices and learn which approaches to avoid for improved performance.

Syllabus

Intro
Our Setup
Configuring Cluster Test change with
Cache/Persist
Join Optimization
Filter Trick
Salting - Reduce Skew
Things to remember
Fair Scheduling
Serialization
Enable GC Logging
ParallelGC (default)
Takeaways


Taught by

Databricks

Related Courses

CS115x: Advanced Apache Spark for Data Science and Data Engineering
University of California, Berkeley via edX
Big Data Analytics
University of Adelaide via edX
Big Data Essentials: HDFS, MapReduce and Spark RDD
Yandex via Coursera
Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames
Yandex via Coursera
Introduction to Apache Spark and AWS
University of London International Programmes via Coursera