YoVDO

Apache Spark? If Only It Worked

Offered By: Devoxx via YouTube

Tags

Devoxx Courses Apache Spark Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore common challenges and optimization techniques for Apache Spark in this 31-minute conference talk from Devoxx. Gain insights into dealing with skewed data, understanding Spark on YARN and its memory model, effective caching strategies, sizing executors, and achieving data locality. Learn from real-world examples and practical solutions to improve performance and stability in Spark applications. Discover a framework for troubleshooting and optimizing Spark jobs, covering topics such as RDD evaluation, execution plans, and debugging tools. Benefit from the speaker's extensive experience working with data infrastructure at companies like VRBO, Spotify, TrueCaller, and Apple.

Syllabus

Introduction
My experience with Spark
Outline of the talk
What is Spark
RDD
Pipelines
Execution Unit
Executor
executor size
small executors
Spark memory model
Memory overhead
Shuffle
In practice
Spark UI
Execution Plan
Skew Data
Locality
Check locality
RDD lazily evaluated
RDD calculation twice
Spark caching
Spark optimization
Map volumes
Improve shuffle
Recap
Debugging tools
Challenge
Use Case
Summary
Questions


Taught by

Devoxx

Related Courses

Play by Play: Developing Microservices and Mobile Apps with JHipster
Pluralsight
Software Archaeology - Learning from the Landing on the Moon
Devoxx via YouTube
Create an Eco-Friendly World with Green Software Engineering
Devoxx via YouTube
Platform Building for Data Mesh - Show Me How It Is Done
Devoxx via YouTube
The Hitchhiker's Guide to Software Architecture and Design
Devoxx via YouTube