YoVDO

How to Make Apache Spark on Kubernetes Run Reliably on Spot Instances

Offered By: Databricks via YouTube

Tags

Apache Spark Courses Cloud Computing Courses Kubernetes Courses Data Processing Courses Distributed Computing Courses Cluster Management Courses Cost Optimization Courses

Course Description

Overview

Discover how to optimize Apache Spark on Kubernetes using spot instances in this 33-minute Databricks conference talk. Learn concrete guidelines and code examples for running Spark reliably on spot VMs, which can provide up to 90% cost savings. Explore key topics such as using spot nodes for Spark executors, mixing instance types and sizes to reduce interruption risks, and leveraging cluster autoscaling. Gain insights into Spark 3.0's graceful decommissioning feature for preserving shuffle files on executor shutdown, and Spark 3.1's PVC reuse on executor restart for disaggregating compute and shuffle storage. Understand the evolution of Spark on Kubernetes, including its architecture, benefits, and comparison to Spark on YARN. Examine real-world experiments demonstrating the impact of spot instances and graceful executor decommissioning. Stay informed about upcoming features in future Spark releases to enhance your data processing capabilities on Kubernetes.

Syllabus

Intro
Kubernetes is a new cluster manager for Spark
The Spark on Kubernetes Journey
Spark on YARN: architecture & pain points
Spark on Kubernetes: architecture & benefits
Our background - Ocean for Apache Spark
Spot instances
How does Spark cope with spot interruptions?
Best practice: run driver OD, execs on Spot
This is how your cluster may look like
Limitation: Avoid cross-Az data transfer
We ran an experiment to measure the impact
Experiment results
Since Spark 3.1: Graceful Exec Decommissioning
Spark 3.1 - Graceful Exec Decommissioning
Graceful Exec Decommissioning - Experiment
Since Spark 3.2: Executor PVC Reuse
What's new in Spark 3.3 for Spark-on-kes
DATA+AI SUMMIT 2022


Taught by

Databricks

Related Courses

Adobe Experience Manager and MongoDB
MongoDB University
Elastic Cloud Infrastructure: Containers and Services auf Deutsch
Google Cloud via Coursera
Architecting with Google Kubernetes Engine: Foundations en Français
Google Cloud via Coursera
Kubernetes Hands-On - Deploy Microservices to the AWS Cloud
Udemy
Docker Swarm: BEGINNER + ADVANCED
Udemy