YoVDO

Managing Multi-Cloud Apache Spark on Kubernetes

Offered By: CNCF [Cloud Native Computing Foundation] via YouTube

Tags

Apache Spark Courses Cloud Computing Courses Kubernetes Courses Cluster Management Courses Autoscaling Courses Observability Courses Multi-Cloud Management Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the challenges and solutions of managing multi-cloud Apache Spark on Kubernetes in this 31-minute conference talk by Ilan Filonenko and Aki Sukegawa from Bloomberg. Dive into Bloomberg's journey of building multi-cloud quant platforms on Kubernetes for financial applications with integrated data science capabilities. Learn about the complexities of managing data science infrastructure across multiple cloud environments, focusing on Apache Spark. Discover strategies for effective Spark infrastructure management spanning bare-metal and public cloud platforms. Examine approaches to auto-scaling, scheduling, preemption, and security in Kubernetes. Gain insights into observability techniques, including methods to surface cluster information to diverse Spark end-users using native Kubernetes resources such as node autoscalers, controllers, and custom PodConditions. Follow the speakers as they discuss user stories, complications, and solutions, exploring topics like custom resources, cluster scaling, event handling, and PodStatus Controller behavior.

Syllabus

Intro
Background (Kubernetes)
Background (Apache Spark)
Background (Spark)
User Stories (Complications)
User Stories (Solutions)
Why custom resource (CR)
Storing information, where to?
First: Cluster scaling up
Cluster autoscaler events
Controller to look up event objects
Next: Scaling down, OOM, etc.
Keeping pods?
Kubernetes custom resource (CR)
PodStatus Controller behavior
Extra: Declarative copying
Extensions


Taught by

CNCF [Cloud Native Computing Foundation]

Related Courses

CS115x: Advanced Apache Spark for Data Science and Data Engineering
University of California, Berkeley via edX
Big Data Analytics
University of Adelaide via edX
Big Data Essentials: HDFS, MapReduce and Spark RDD
Yandex via Coursera
Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames
Yandex via Coursera
Introduction to Apache Spark and AWS
University of London International Programmes via Coursera