Building a Data Platform with Apache Spark on Kubernetes
Offered By: WeAreDevelopers via YouTube
Course Description
Overview
Explore the challenges and solutions of building a data platform using Apache Spark on Kubernetes in this 31-minute conference talk. Learn how PUBG Corporation migrated its on-demand data analytics platform to Spark on Kubernetes, serving millions of online gamers. Discover the Sphynx project, which manages on-demand Spark clusters and Jupyter Notebooks as containerized applications on Kubernetes. Gain insights into the main log pipeline, Apache Spark layer platform, batch systems, and data system domain. Understand Kubernetes deployment, scheduling, and platform architecture. Delve into workflows, best practices, monitoring strategies, and future work considerations. Walk away with key takeaways for implementing Spark on Kubernetes in large-scale data processing environments.
Syllabus
Introduction
Overview
Main Log Pipeline
Apache Spark
Layer Platform
Notebooks
Batch System
Spark Platform
Data System Domain
Problems
What is Kubernetes
Kubernetes Deployment
Kubernetes Scheduler
Platform Architecture
Workflow
Best Sauce
Challenges
Monitoring
Future Work
Key Takeaways
Questions
Taught by
WeAreDevelopers
Related Courses
Fundamentals of Containers, Kubernetes, and Red Hat OpenShiftRed Hat via edX Configuration Management for Containerized Delivery
Microsoft via edX Getting Started with Google Kubernetes Engine - Español
Google Cloud via Coursera Getting Started with Google Kubernetes Engine - 日本語版
Google Cloud via Coursera Architecting with Google Kubernetes Engine: Foundations en Español
Google Cloud via Coursera