YoVDO

Apache Spark Performance Tuning and Best Practices

Offered By: NashKnolX via YouTube

Tags

Apache Spark Courses Big Data Courses Data Filtering Courses Performance Tuning Courses DataFrames Courses User-Defined Functions Courses Serialization Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Learn essential best practices for running Apache Spark in production environments and optimize system-level performance in this 42-minute tutorial. Explore major bottlenecks, understand Spark casting techniques, and discover the advantages of broadcast operations. Dive into serialization, DataFrame operations, and UDF implementation. Master data filtering, supply reduction, and file format optimization. Gain insights on executor optimization, memory tuning, and handling out-of-memory errors. Equip yourself with the knowledge to fine-tune Apache Spark for peak performance in real-world scenarios.

Syllabus

Intro
What is Spark
How to optimize
Major bottlenecks
Spark Casting
Spark Casting Demo
Disadvantages of Casting
Advantages of Broadcast
Architecture of Broadcast
Serialization
Serializer
DataFrame
UDF
Filter Data
Supply
Reducing Supply
Importance of File Format
Handling of Data
File Format Optimization
Executor Optimization
Out of Memory
Memory Tuning
Conclusion


Taught by

NashKnolX

Related Courses

CS115x: Advanced Apache Spark for Data Science and Data Engineering
University of California, Berkeley via edX
Big Data Analytics
University of Adelaide via edX
Big Data Essentials: HDFS, MapReduce and Spark RDD
Yandex via Coursera
Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames
Yandex via Coursera
Introduction to Apache Spark and AWS
University of London International Programmes via Coursera