YoVDO

Improving Apache Spark Application Processing Time - Configuration and Optimization Techniques

Offered By: Databricks via YouTube

Tags

Apache Spark Courses Java Courses Kubernetes Courses Azure Blob Storage Courses Garbage Collection Courses

Course Description

Overview

Explore techniques for optimizing Apache Spark application processing time in this 25-minute Databricks session. Learn how to improve a Spark structured streaming application's micro-batch time from ~55 to ~30 seconds through real-world use cases. Discover optimization strategies for applications processing ~700 MB/s of compressed data with strict KPIs, utilizing technologies like Spark 3.1, Kafka, Azure Blob Storage, AKS, and Java 11. Gain insights into Spark configuration changes, code optimizations, and implementing custom data sources. Delve into topics such as input architecture, Spark Data Source implementation, partitioning strategies, dynamic task allocation, optimal partition numbers, and Garbage Collection analysis, including the Garbage First (G1) GC.

Syllabus

Intro
About CSI Group (Cloud Security Intelligence)
Application Architecture and Overview
Input Architecture
Read Phase: Spark Data Source Overview
Spark Data Source Implementation
Partitioning Strategies
Dynamic number of tasks
Custom Spark Data Source - Summary
Optimal Number of Partitions
Garbage Collection - Analysis
Garbage First (GI) GC
Garbage Collection - Summary


Taught by

Databricks

Related Courses

Introduction to Cloud Infrastructure Technologies
Linux Foundation via edX
Scalable Microservices with Kubernetes
Google via Udacity
Google Cloud Fundamentals: Core Infrastructure
Google via Coursera
Introduction to Kubernetes
Linux Foundation via edX
Fundamentals of Containers, Kubernetes, and Red Hat OpenShift
Red Hat via edX