YoVDO

Beyond Shuffling - Scaling Apache Spark

Offered By: Scala Days Conferences via YouTube

Tags

Scala Days Courses Apache Spark Courses Cluster Management Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore advanced techniques for scaling Apache Spark in this 43-minute conference talk from Scala Days Berlin 2016. Delve into best practices and code snippets for handling large datasets efficiently. Learn to leverage Spark counters for performance investigation, optimize key-value data operations, and replace groupByKey with memory-efficient alternatives. Discover effective caching and checkpointing strategies to reduce execution time. Gain insights on functional transformations using Spark Datasets, working in noisy cluster environments, and utilizing Spark SQL for improved performance. Master the art of validating Spark jobs with accumulators and explore additional testing resources to enhance your Spark development skills.

Syllabus

Intro
What is going to be covered
The different pieces of Spark
What is key skew and why do we care?
Well there is a bit of magic in the shume....
Iterator tortor transformations
Why is Spark SQL good for those things?
How much faster can it be?
How to avoid lineage explosions
Introducing Datasets
And functional style maps
Switching gears: Valdating Spark jobs
Using an accumulator for validation
Validating records read matches our expectations
Additional Spark Testing Resources
Additional Spark Resources
Spark Videos


Taught by

Scala Days Conferences

Related Courses

Teaching Domain Specific Languages in Scala
Scala Days Conferences via YouTube
Why Dolly Is Just the Beginning for Open LLM Models
Scala Days Conferences via YouTube
Building Billion Node Graphs for Machine Learning
Scala Days Conferences via YouTube
How Does Incremental Compilation Work with Scala 3
Scala Days Conferences via YouTube
AI Assisted Development
Scala Days Conferences via YouTube