Running Big Data Applications at Scale on K8s
Offered By: CNCF [Cloud Native Computing Foundation] via YouTube
Course Description
Overview
Explore how Intuit leverages Spark on Kubernetes to process big data at scale in this conference talk. Learn about the advantages of running data processing workloads on Kubernetes, including cost reduction and increased production speed. Discover Intuit's journey in building a data processing platform, their experiences with the Spark operator, and how they addressed challenges like network bottlenecks. Gain insights into the benefits of containerization for data scientists and the future plans for Intuit's big data infrastructure. Understand the comparison between Kubernetes and Yarn, native workflows, and the impact on transaction categorization and personalization problems.
Syllabus
Introduction
Intuits Data Lake
Kubernetes vs Yarn
Native workflow
Spark operator
Transaction categorization
Personalization problem
Learnings
SpoK Part 3
Advantages
Future plans
Contact details
Questions
Cost Reduction
Cost Advantages
Effort
Spark
Network bottlenecks
Spark Operator on Kubernetes
What new things should data scientists learn
Container Journey
Slides
Spark Containers
Feature Processing
Spark Overlay
What do data scientists need to learn
Wrap up
Taught by
CNCF [Cloud Native Computing Foundation]
Related Courses
Coding the Matrix: Linear Algebra through Computer Science ApplicationsBrown University via Coursera كيف تفكر الآلات - مقدمة في تقنيات الحوسبة
King Fahd University of Petroleum and Minerals via Rwaq (رواق) Datascience et Analyse situationnelle : dans les coulisses du Big Data
IONIS via IONIS Data Lakes for Big Data
EdCast 統計学Ⅰ:データ分析の基礎 (ga014)
University of Tokyo via gacco