Different Streaming Methods with Apache Spark and Kafka
Offered By: Databricks via YouTube
Course Description
Overview
Explore different streaming methods using Apache Spark and Kafka in this 34-minute conference talk by Itai Yaffe from Nielsen. Learn how Nielsen Marketing Cloud (NMC) transformed their data infrastructure to support real-time analytics for marketers and publishers. Discover the journey from CSV files and standalone Java applications to multiple Kafka and Spark clusters, handling a mixture of streaming and batch ETLs while supporting 10x data growth. Gain insights into early adoption experiences with Spark Streaming and Spark Structured Streaming, including overcoming technical challenges. Examine a unique solution using Kafka to simulate streaming over a Data Lake, reducing cloud service costs. Cover topics such as Kafka and Spark Streaming for stateless and stateful use cases, Spark Structured Streaming as an alternative, combining Spark Streaming with batch ETLs, and "streaming" over Data Lake using Kafka.
Syllabus
Intro
Problems
Whats Next
Local Aggregation
Weaknesses
Kafka
Summary
Recap
Big Data for Women
Questions
Taught by
Databricks
Related Courses
CS115x: Advanced Apache Spark for Data Science and Data EngineeringUniversity of California, Berkeley via edX Big Data Analytics
University of Adelaide via edX Big Data Essentials: HDFS, MapReduce and Spark RDD
Yandex via Coursera Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames
Yandex via Coursera Introduction to Apache Spark and AWS
University of London International Programmes via Coursera