Modernizing Apache Spark 3.0 Applications - Best Practices and Optimization Techniques
Offered By: Databricks via YouTube
Course Description
Overview
Explore strategies for modernizing Apache Spark applications to leverage the full potential of Spark 3.0 and beyond in this 25-minute talk by Databricks. Learn about common sources of technical debt in mature Spark applications and how to address them, discover when to replace manual configurations with Adaptive Query Execution, and understand how to optimize queries for columnar processing and GPU execution. Gain insights from concrete examples of customer churn modeling, recent experiences in modernizing Spark applications, and lessons learned from maintaining Spark extensions across multiple versions. Delve into topics such as the Diderot effect in data processing systems, Project Tungsten, adaptive query execution techniques, and accelerating Spark with NVIDIA GPUs. Acquire valuable knowledge to enhance your analytics workloads and incorporate accelerated ML training directly into your Spark applications.
Syllabus
Intro
Denis Diderot and the Diderot effect
The Diderot effect in data processing systems
The Diderot effect in Spark: Project Tungsten (2015)
The Diderot effect, revised for 2021
What's your oldest Spark application?
Abstractions can leak in performance tuning
Choosing the right partition size is difficult
Adaptive query execution: coalescing
Sidebar: some basics on joins
Adaptive query execution: partition pruning
Enabling adaptive query execution
Accelerating Spark with NVIDIA GPUs
Case study: predicting customer churn
What's next?
Taught by
Databricks
Related Courses
Coding the Matrix: Linear Algebra through Computer Science ApplicationsBrown University via Coursera كيف تفكر الآلات - مقدمة في تقنيات الحوسبة
King Fahd University of Petroleum and Minerals via Rwaq (رواق) Datascience et Analyse situationnelle : dans les coulisses du Big Data
IONIS via IONIS Data Lakes for Big Data
EdCast 統計学Ⅰ:データ分析の基礎 (ga014)
University of Tokyo via gacco