YoVDO

Cloud-Native Apache Spark Scheduling with YuniKorn on Kubernetes

Offered By: Databricks via YouTube

Tags

Apache Spark Courses Big Data Courses Machine Learning Courses Kubernetes Courses Cloudera Courses Databricks Courses

Course Description

Overview

Explore cloud-native Apache Spark scheduling using YuniKorn Scheduler in this 36-minute conference talk from Databricks. Dive into the architecture of cloud-native infrastructure and learn how YuniKorn, an open-source resource scheduler, redefines resource scheduling in the cloud. Discover how to manage quotas, resource sharing, and auto-scaling for efficient scheduling of large-scale Spark jobs on Kubernetes. Gain insights into Lyft and Cloudera's experiences with next-generation cloud-native infrastructure, and understand the challenges and solutions for running Spark on Kubernetes. Learn about YuniKorn's advantages over default schedulers, including job ordering, resource quota management, and fairness in queue allocation. Compare YuniKorn with other Kubernetes schedulers and explore its management console. Get an overview of YuniKorn's current status, community involvement, roadmap, and vision for resource management in big data and machine learning environments.

Syllabus

Intro
Role of Kos in Lyft's Data Landscape
Multi-step creation for a Spark KBs job
Problems of existing Spark K8s infrastructure Complexity of layers of custom Kås controllers to handle the scale of the
Why we need a customized K8s Scheduler
Flavors of Running Spark on KBS
Resource Scheduling in K8s
Spark on K8s: the scheduling challenges
Apache Yunikorn (Incubating)
Resource Scheduling in Yunikorn land compare w/default scheduler
Main difference (Yunikorn v.s Default Scheduler)
Run Spark with Yunikorn
Job Ordering
Resource Quota Management: K8s Namespace ResourceQuota
Resource Quota Management: Yunikorn Queue Capacity
Resource Fairness in Yunikorn Queues
Scheduler Throughput Benchmark
Fully K8s Compatible
Yunikorn Management Console
Compare Yunikorn with other K8s schedulers
Current Status
The Community
Roadmap
Our Vision - Resource Mgmt for Big Data & ML


Taught by

Databricks

Related Courses

CS115x: Advanced Apache Spark for Data Science and Data Engineering
University of California, Berkeley via edX
Big Data Analytics
University of Adelaide via edX
Big Data Essentials: HDFS, MapReduce and Spark RDD
Yandex via Coursera
Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames
Yandex via Coursera
Introduction to Apache Spark and AWS
University of London International Programmes via Coursera