Modernizing Apache Spark 3.0 Applications - Best Practices and Optimization Techniques
Offered By: Databricks via YouTube
Course Description
Overview
Explore strategies for modernizing Apache Spark applications to leverage the full potential of Spark 3.0 and beyond in this 25-minute talk by Databricks. Learn about common sources of technical debt in mature Spark applications and how to address them, discover when to replace manual configurations with Adaptive Query Execution, and understand how to optimize queries for columnar processing and GPU execution. Gain insights from concrete examples of customer churn modeling, recent experiences in modernizing Spark applications, and lessons learned from maintaining Spark extensions across multiple versions. Delve into topics such as the Diderot effect in data processing systems, Project Tungsten, adaptive query execution techniques, and accelerating Spark with NVIDIA GPUs. Acquire valuable knowledge to enhance your analytics workloads and incorporate accelerated ML training directly into your Spark applications.
Syllabus
Intro
Denis Diderot and the Diderot effect
The Diderot effect in data processing systems
The Diderot effect in Spark: Project Tungsten (2015)
The Diderot effect, revised for 2021
What's your oldest Spark application?
Abstractions can leak in performance tuning
Choosing the right partition size is difficult
Adaptive query execution: coalescing
Sidebar: some basics on joins
Adaptive query execution: partition pruning
Enabling adaptive query execution
Accelerating Spark with NVIDIA GPUs
Case study: predicting customer churn
What's next?
Taught by
Databricks
Related Courses
Fundamentals of Accelerated Computing with CUDA C/C++Nvidia via Independent Using GPUs to Scale and Speed-up Deep Learning
IBM via edX Deep Learning
IBM via edX Deep Learning with IBM
IBM via edX Accelerating Deep Learning with GPUs
IBM via Cognitive Class