YoVDO

Super Reliable Cloud Native Data Processing Using Apache Spark and Cloud Shuffle Manager

Offered By: CNCF [Cloud Native Computing Foundation] via YouTube

Tags

Apache Spark Courses Big Data Courses Cloud Computing Courses Kubernetes Courses Data Processing Courses Fault Tolerance Courses Cloud Storage Courses Cost Optimization Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a conference talk on enhancing Apache Spark's reliability for cloud-native data processing using Cloud Shuffle Manager. Discover how Apple engineers Bo Yang and HAI TAO address the challenge of fault tolerance in Spark's internal shuffle data when running on Kubernetes. Learn about the innovative Cloud Shuffle Manager, which stores shuffle data replications on cloud storage, enabling Spark to read from workers in normal conditions and from cloud storage during worker failures. Gain insights into the underlying optimizations for improved shuffle performance and how this approach allows for reliable Spark application execution on Spot Instances/VMs, resulting in significant cost savings at scale. Understand the potential of this solution for enhancing the reliability and cost-effectiveness of large-scale data processing in cloud environments.

Syllabus

Super Reliable Cloud Native Data Processing Using Apache Spark and Cloud Shuffle Manager


Taught by

CNCF [Cloud Native Computing Foundation]

Related Courses

CS115x: Advanced Apache Spark for Data Science and Data Engineering
University of California, Berkeley via edX
Big Data Analytics
University of Adelaide via edX
Big Data Essentials: HDFS, MapReduce and Spark RDD
Yandex via Coursera
Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames
Yandex via Coursera
Introduction to Apache Spark and AWS
University of London International Programmes via Coursera