Apache Spark Core - Practical Optimization Techniques - Partition Shaping and Job Optimization
Offered By: Databricks via YouTube
Course Description
Overview
Dive into a comprehensive conference talk on Apache Spark Core optimization techniques. Learn how to properly shape partitions and jobs to enable powerful optimizations, eliminate skew, and maximize cluster utilization. Explore various Spark Partition shaping methods along with several optimization strategies, including join optimizations, aggregate optimizations, salting, and multi-dimensional parallelism. Gain insights into software hierarchy, hardware considerations, and practical demonstrations. Discover techniques such as lazy loading, data skipping, and shuffle partition management. Understand the importance of input and output partitions, workload balancing, and persistence strategies. Delve into advanced topics like DBIO Cache, Joint Optimization, Broadcast Join, and Skew Joins. By the end of this 1 hour and 32 minutes talk, master the skills needed to optimize Apache Spark Core for improved performance and efficiency in data analytics tasks.
Syllabus
Introduction
About Daniel
Agenda
Software Hierarchy
Demo
Hardware
Baseline
CP Utilization
ganglia reports
lazy loading
code
data skipping
optimizations
output
shuffle partitions
workload
shuffle partition example
shuffle partition summary
input partition summary
what does this do
output partitions
workload example
Partitions
Balance
Persistence
DBIO Cache
Joint Optimization
Broadcast Join
Skew Joins
Group Buys
The Beast
Taught by
Databricks
Related Courses
Big Data EssentialsA Cloud Guru Big Data
University of Adelaide via edX Advanced Data Science with IBM
IBM via Coursera Amazon EMR Getting Started (Indonesian)
Amazon Web Services via AWS Skill Builder Analisar e preparar dados com o Amazon SageMaker Data Wrangler e o Amazon EMR (Português (Brasil)) | Lab - Analyze and Prepare Data with Amazon SageMaker Data Wrangler and Amazon EMR (Portuguese (Brazil))
Amazon Web Services via AWS Skill Builder