YoVDO

Hybrid Apache Spark Architecture: Optimizing YARN and Kubernetes for Lyft's Workloads

Offered By: Databricks via YouTube

Tags

Apache Spark Courses Machine Learning Courses Kubernetes Courses Scalability Courses Containerization Courses

Course Description

Overview

Explore a 44-minute conference talk from Databricks detailing Lyft's innovative hybrid Apache Spark architecture utilizing YARN and Kubernetes. Dive into the challenges faced by Lyft when scaling their Batch ETL and ML spark workloads on Kubernetes, and discover the hybrid solution developed to optimize both containerized and non-containerized workloads. Learn about the dynamic runtime controller for environment-specific configurations and seamless resource manager switching. Gain insights into Spark use cases, scaling challenges, image management, and the advantages of the hybrid approach. Examine the Spark Operator, image hierarchy distribution, and recent improvements. Conclude with future plans and key takeaways for implementing a robust Spark architecture in large-scale transportation technology environments.

Syllabus

Introduction
Agenda
Spark Use Cases
YARN in 2018
Scaling
Challenges
Image Management
Hybrid Approach
Hybrid Architecture
Hybrid Architecture Advantages
Spark Operator
Image Hierarchy Distribution
Recap
Improvements
Future Plans
Takeaways


Taught by

Databricks

Related Courses

CS115x: Advanced Apache Spark for Data Science and Data Engineering
University of California, Berkeley via edX
Big Data Analytics
University of Adelaide via edX
Big Data Essentials: HDFS, MapReduce and Spark RDD
Yandex via Coursera
Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames
Yandex via Coursera
Introduction to Apache Spark and AWS
University of London International Programmes via Coursera