YoVDO

Apache Spark and Databricks - Stream Processing in Lakehouse

Offered By: Udemy

Tags

Apache Spark Courses Databricks Courses Memory Management Courses Real-time Stream Processing Courses

Course Description

Overview

Master Stream processing using Apache Spark (PySpark) and Databricks Cloud (Azure) with an End-to-End Capstone Project

What you'll learn:
  • Real-time Stream Processing Concepts
  • Spark Structured Streaming APIs and Architecture
  • Working with Streaming Sources and Sinks
  • Kafka for Data Engineers
  • Working With Kafka Source and Integrating Spark with Kafka
  • State-less and State-full Streaming Transformations
  • Windowing Aggregates using Spark Stream
  • Watermarking and State Cleanup
  • Streaming Joins and Aggregation
  • Handling Memory Problems with Streaming Joins
  • Working with Azure Databricks
  • Capstone Project - Streaming application in Lakehouse

About the Course

I am creating Apache Spark and Databricks - Stream Processing in Lakehouse using the Python Language and PySpark API. This course will help you understand Real-time Stream processing using Apache Spark and Databricks Cloud and apply that knowledge to build real-time stream processing solutions. This course is example-driven and follows a working session-like approach. We will take a live coding approach and explain all the needed concepts.

Capstone Project

This course also includes an End-To-End Capstone project. The project will help you understand the real-life project design, coding, implementation, testing, and CI/CD approach.

Who should take this Course?

I designed this course for software engineers willing to develop a Real-time Stream Processing Pipeline and application using Apache Spark. I am also creating this course for data architects and data engineers who are responsible for designing and building the organization’s data-centric infrastructure. Another group of people is the managers and architects who do not directly work with Spark implementation. Still, they work with those implementing Apache Spark at the ground level.

Spark Version used in the Course.

This Course is using the Apache Spark 3.5. I have tested all the source code and examples used in this Course on Azure Databricks Cloud using Databricks Runtime 14.1.



Taught by

Prashant Kumar Pandey and Learning Journal

Related Courses

Big Data
University of Adelaide via edX
Advanced Data Science with IBM
IBM via Coursera
Analysing Unstructured Data using MongoDB and PySpark
Coursera Project Network via Coursera
Apache Spark for Data Engineering and Machine Learning
IBM via edX
Apache Spark (TM) SQL for Data Analysts
Databricks via Coursera