Apache Spark? If Only It Worked
Offered By: Devoxx via YouTube
Course Description
Overview
Explore common challenges and optimization techniques for Apache Spark in this 31-minute conference talk from Devoxx. Gain insights into dealing with skewed data, understanding Spark on YARN and its memory model, effective caching strategies, sizing executors, and achieving data locality. Learn from real-world examples and practical solutions to improve performance and stability in Spark applications. Discover a framework for troubleshooting and optimizing Spark jobs, covering topics such as RDD evaluation, execution plans, and debugging tools. Benefit from the speaker's extensive experience working with data infrastructure at companies like VRBO, Spotify, TrueCaller, and Apple.
Syllabus
Introduction
My experience with Spark
Outline of the talk
What is Spark
RDD
Pipelines
Execution Unit
Executor
executor size
small executors
Spark memory model
Memory overhead
Shuffle
In practice
Spark UI
Execution Plan
Skew Data
Locality
Check locality
RDD lazily evaluated
RDD calculation twice
Spark caching
Spark optimization
Map volumes
Improve shuffle
Recap
Debugging tools
Challenge
Use Case
Summary
Questions
Taught by
Devoxx
Related Courses
Play by Play: Developing Microservices and Mobile Apps with JHipsterPluralsight Software Archaeology - Learning from the Landing on the Moon
Devoxx via YouTube Create an Eco-Friendly World with Green Software Engineering
Devoxx via YouTube Platform Building for Data Mesh - Show Me How It Is Done
Devoxx via YouTube The Hitchhiker's Guide to Software Architecture and Design
Devoxx via YouTube