YoVDO

How to Make Apache Spark on Kubernetes Run Reliably on Spot Instances

Offered By: Databricks via YouTube

Tags

Apache Spark Courses Cloud Computing Courses Kubernetes Courses Data Processing Courses Distributed Computing Courses Cluster Management Courses Cost Optimization Courses

Course Description

Overview

Discover how to optimize Apache Spark on Kubernetes using spot instances in this 33-minute Databricks conference talk. Learn concrete guidelines and code examples for running Spark reliably on spot VMs, which can provide up to 90% cost savings. Explore key topics such as using spot nodes for Spark executors, mixing instance types and sizes to reduce interruption risks, and leveraging cluster autoscaling. Gain insights into Spark 3.0's graceful decommissioning feature for preserving shuffle files on executor shutdown, and Spark 3.1's PVC reuse on executor restart for disaggregating compute and shuffle storage. Understand the evolution of Spark on Kubernetes, including its architecture, benefits, and comparison to Spark on YARN. Examine real-world experiments demonstrating the impact of spot instances and graceful executor decommissioning. Stay informed about upcoming features in future Spark releases to enhance your data processing capabilities on Kubernetes.

Syllabus

Intro
Kubernetes is a new cluster manager for Spark
The Spark on Kubernetes Journey
Spark on YARN: architecture & pain points
Spark on Kubernetes: architecture & benefits
Our background - Ocean for Apache Spark
Spot instances
How does Spark cope with spot interruptions?
Best practice: run driver OD, execs on Spot
This is how your cluster may look like
Limitation: Avoid cross-Az data transfer
We ran an experiment to measure the impact
Experiment results
Since Spark 3.1: Graceful Exec Decommissioning
Spark 3.1 - Graceful Exec Decommissioning
Graceful Exec Decommissioning - Experiment
Since Spark 3.2: Executor PVC Reuse
What's new in Spark 3.3 for Spark-on-kes
DATA+AI SUMMIT 2022


Taught by

Databricks

Related Courses

Introduction to Cloud Infrastructure Technologies
Linux Foundation via edX
Scalable Microservices with Kubernetes
Google via Udacity
Google Cloud Fundamentals: Core Infrastructure
Google via Coursera
Introduction to Kubernetes
Linux Foundation via edX
Fundamentals of Containers, Kubernetes, and Red Hat OpenShift
Red Hat via edX