Super Reliable Cloud Native Data Processing Using Apache Spark and Cloud Shuffle Manager
Offered By: CNCF [Cloud Native Computing Foundation] via YouTube
Course Description
Overview
Explore a conference talk on enhancing Apache Spark's reliability for cloud-native data processing using Cloud Shuffle Manager. Discover how Apple engineers Bo Yang and HAI TAO address the challenge of fault tolerance in Spark's internal shuffle data when running on Kubernetes. Learn about the innovative Cloud Shuffle Manager, which stores shuffle data replications on cloud storage, enabling Spark to read from workers in normal conditions and from cloud storage during worker failures. Gain insights into the underlying optimizations for improved shuffle performance and how this approach allows for reliable Spark application execution on Spot Instances/VMs, resulting in significant cost savings at scale. Understand the potential of this solution for enhancing the reliability and cost-effectiveness of large-scale data processing in cloud environments.
Syllabus
Super Reliable Cloud Native Data Processing Using Apache Spark and Cloud Shuffle Manager
Taught by
CNCF [Cloud Native Computing Foundation]
Related Courses
Coding the Matrix: Linear Algebra through Computer Science ApplicationsBrown University via Coursera كيف تفكر الآلات - مقدمة في تقنيات الحوسبة
King Fahd University of Petroleum and Minerals via Rwaq (رواق) Datascience et Analyse situationnelle : dans les coulisses du Big Data
IONIS via IONIS Data Lakes for Big Data
EdCast 統計学Ⅰ:データ分析の基礎 (ga014)
University of Tokyo via gacco