Building a Data Platform with Apache Spark on Kubernetes
Offered By: WeAreDevelopers via YouTube
Course Description
Overview
Explore the challenges and solutions of building a data platform using Apache Spark on Kubernetes in this 31-minute conference talk. Learn how PUBG Corporation migrated its on-demand data analytics platform to Spark on Kubernetes, serving millions of online gamers. Discover the Sphynx project, which manages on-demand Spark clusters and Jupyter Notebooks as containerized applications on Kubernetes. Gain insights into the main log pipeline, Apache Spark layer platform, batch systems, and data system domain. Understand Kubernetes deployment, scheduling, and platform architecture. Delve into workflows, best practices, monitoring strategies, and future work considerations. Walk away with key takeaways for implementing Spark on Kubernetes in large-scale data processing environments.
Syllabus
Introduction
Overview
Main Log Pipeline
Apache Spark
Layer Platform
Notebooks
Batch System
Spark Platform
Data System Domain
Problems
What is Kubernetes
Kubernetes Deployment
Kubernetes Scheduler
Platform Architecture
Workflow
Best Sauce
Challenges
Monitoring
Future Work
Key Takeaways
Questions
Taught by
WeAreDevelopers
Related Courses
CS115x: Advanced Apache Spark for Data Science and Data EngineeringUniversity of California, Berkeley via edX Big Data Analytics
University of Adelaide via edX Big Data Essentials: HDFS, MapReduce and Spark RDD
Yandex via Coursera Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames
Yandex via Coursera Introduction to Apache Spark and AWS
University of London International Programmes via Coursera