YoVDO

Apache Spark Project for Beginners: A Complete Project Guide

Offered By: Udemy

Tags

Apache Spark Courses Data Visualization Courses Scala Courses MySQL Courses MongoDB Courses Apache Kafka Courses Relational Databases Courses NoSQL Databases Courses Spark Structured Streaming Courses

Course Description

Overview

Real-Time Message Processing Application

What you'll learn:
  • End to End Apache Spark Project Development
  • How Real-Time Streaming Application Works
  • Features of Spark Structured Streaming using Scala
  • How Apache Kafka works well with Apache Spark
  • How to make use of NoSQL like MongoDB and RDBMS like MySQL in Real-Time Streaming Application
  • How to build nice Visualisation Dashboard using Python

End to End Project Development of Real-Time Message Processing Application: In this Apache Spark Project, we are going to build Meetup RSVP Stream Processing Application using Apache Spark with Scala API, Spark Structured Streaming, Apache Kafka, Python, Python Dash, MongoDB and MySQL. And we are going to build a data pipeline which takes data from stream data source(Meetup Dot Com RSVP Stream API Data) to Data Visualisation using Apache Spark and other big data frameworks.

Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance.

Apache Kafka is a distributed event store and stream-processing platform. It is an open-source system developed by the Apache Software Foundation written in Java and Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model.

A NoSQL (originally referring to "non-SQL" or "non-relational") database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.


Taught by

PARI MARGU

Related Courses

Deploying Apache Pulsar to Google Kubernetes Engine
Pluralsight
Stream Processing Design Patterns with Kafka Streams
LinkedIn Learning
Apache Kafka Series - Confluent Schema Registry & REST Proxy
Udemy
Apache Kafka Series - Kafka Connect Hands-on Learning
Udemy
The Complete Apache Kafka Practical Guide
Udemy