YoVDO

Apache Spark? If Only It Worked

Offered By: Devoxx via YouTube

Tags

Devoxx Courses Apache Spark Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore common challenges and optimization techniques for Apache Spark in this 31-minute conference talk from Devoxx. Gain insights into dealing with skewed data, understanding Spark on YARN and its memory model, effective caching strategies, sizing executors, and achieving data locality. Learn from real-world examples and practical solutions to improve performance and stability in Spark applications. Discover a framework for troubleshooting and optimizing Spark jobs, covering topics such as RDD evaluation, execution plans, and debugging tools. Benefit from the speaker's extensive experience working with data infrastructure at companies like VRBO, Spotify, TrueCaller, and Apple.

Syllabus

Introduction
My experience with Spark
Outline of the talk
What is Spark
RDD
Pipelines
Execution Unit
Executor
executor size
small executors
Spark memory model
Memory overhead
Shuffle
In practice
Spark UI
Execution Plan
Skew Data
Locality
Check locality
RDD lazily evaluated
RDD calculation twice
Spark caching
Spark optimization
Map volumes
Improve shuffle
Recap
Debugging tools
Challenge
Use Case
Summary
Questions


Taught by

Devoxx

Related Courses

CS115x: Advanced Apache Spark for Data Science and Data Engineering
University of California, Berkeley via edX
Big Data Analytics
University of Adelaide via edX
Big Data Essentials: HDFS, MapReduce and Spark RDD
Yandex via Coursera
Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames
Yandex via Coursera
Introduction to Apache Spark and AWS
University of London International Programmes via Coursera