HDFS CSI Plugin: Speeding Up Kubernetes in On-Premises Big Data Clusters
Offered By: Linux Foundation via YouTube
Course Description
Overview
Explore the integration of Kubernetes with on-premises big data clusters through this conference talk. Learn about the HDFS CSI Plugin design and architecture, addressing the challenge of consuming HDFS data with Kubernetes. Discover best practices for running Spark workloads on Kubernetes with HDFS access using the CSI plugin. Examine performance comparisons between Spark on Kubernetes with HDFS and Spark on YARN with HDFS using the TPC-DS benchmark suite. Gain insights into big data history, containerization benefits, Kubernetes architecture, CSI core services, volume lifecycle management, and Hadoop HDFS characteristics as persistent volumes. Understand the potential of Kubernetes as an alternative to Hadoop YARN for resource scheduling in on-premises big data environments.
Syllabus
Intro
Outline
Big Data History Cont.
Big Data Stack
Big Data Trend
Benefit of Containerization
Kubernetes Architecture
Challenges
CSI(Container Storage Interface)
CSI Core Services
CSI Advance Features
Volume Lifecycle Volume Lifecycle
Controller and Node Services
Kubernetes Storages
Kubernetes CSI Support
PV, PVC and Storage Class
Package and Deployment Suggestion
Hadoop HDFS
HDFS Cluster Scale
Apache Ozone
HDFS/Ozone as PV
HDFS Characteristics as PV
HDFS NFS Gateway CSI
Ozone CSI
Resources
Taught by
Linux Foundation
Tags
Related Courses
Web Intelligence and Big DataIndian Institute of Technology Delhi via Coursera Big Data for Better Performance
Open2Study Big Data and Education
Columbia University via edX Big Data Analytics in Healthcare
Georgia Institute of Technology via Udacity Data Mining with Weka
University of Waikato via Independent