YoVDO

Different Streaming Methods with Apache Spark and Kafka

Offered By: Databricks via YouTube

Tags

Apache Spark Courses Big Data Courses Data Lakes Courses Stream Processing Courses Real-Time Analytics Courses Spark Streaming Courses ETL Courses Spark Structured Streaming Courses

Course Description

Overview

Explore different streaming methods using Apache Spark and Kafka in this 34-minute conference talk by Itai Yaffe from Nielsen. Learn how Nielsen Marketing Cloud (NMC) transformed their data infrastructure to support real-time analytics for marketers and publishers. Discover the journey from CSV files and standalone Java applications to multiple Kafka and Spark clusters, handling a mixture of streaming and batch ETLs while supporting 10x data growth. Gain insights into early adoption experiences with Spark Streaming and Spark Structured Streaming, including overcoming technical challenges. Examine a unique solution using Kafka to simulate streaming over a Data Lake, reducing cloud service costs. Cover topics such as Kafka and Spark Streaming for stateless and stateful use cases, Spark Structured Streaming as an alternative, combining Spark Streaming with batch ETLs, and "streaming" over Data Lake using Kafka.

Syllabus

Intro
Problems
Whats Next
Local Aggregation
Weaknesses
Kafka
Summary
Recap
Big Data for Women
Questions


Taught by

Databricks

Related Courses

Web Intelligence and Big Data
Indian Institute of Technology Delhi via Coursera
Big Data for Better Performance
Open2Study
Big Data and Education
Columbia University via edX
Big Data Analytics in Healthcare
Georgia Institute of Technology via Udacity
Data Mining with Weka
University of Waikato via Independent