YoVDO

Streaming Big Data with Spark Streaming and Scala

Offered By: Udemy

Tags

Apache Spark Courses Big Data Courses Machine Learning Courses Scala Courses Data Ingestion Courses Spark Streaming Courses Spark SQL Courses

Course Description

Overview

Spark Streaming tutorial covering Spark Structured Streaming, Kafka integration, and streaming big data in real-time.

What you'll learn:
  • Process massive streams of real-time data using Spark Streaming
  • Integrate Spark Streaming with data sources, including Kafka, Flume, and Kinesis
  • Use Spark 2's Structured Streaming API
  • Create Spark applications using the Scala programming language
  • Output transformed real-time data to Cassandra or file systems
  • Integrate Spark Streaming with Spark SQL to query streaming data in real time
  • Train machine learning models with streaming data, and use those models for real-time predictions
  • Ingest Apache access log data and transform streams of it
  • Receive real-time streams of Twitter feeds
  • Maintain stateful data across a continuous stream of input data
  • Query streaming data across sliding windows of time

WARNING: This course includes activities that involve Twitter integration, using anAPITwitter has recently disabled. Following along hands-on is no longer possible for these activities, but you can still learn about streaming from watching the videos.

"Big Data" analysis is a hot and highly valuable skill. Thing is, "big data"never stops flowing! Spark Streaming is a new and quickly developing technology for processing massive data sets as they are created - why wait for some nightly analysis to run when you can constantly update your analysis in real time, all the time? Whether it's clickstream data from a big website, sensor data from a massive "Internet of Things" deployment, financial data, or something else - Spark Streaming is a powerful technology for transforming and analyzingthat data right when it is created, all the time.

You'll be learning from an ex-engineer and senior manager from Amazon and IMDb.

This course gets your hands on to some real live Twitter data, simulated streams of Apache access logs, and even data used to train machine learning models! You'll write and run real Spark Streaming jobs right at home on your own PC, and toward the end of the course, we'll show you how to take those jobs to a real Hadoop cluster and run them in a production environment too.

Across over 30 lectures and almost 6 hours of video content, you'll:

  • Get a crash course in the Scala programming language

  • Learn how ApacheSpark operates on a cluster

  • Set up discretized streams withSpark Streaming and transform them as data is received

  • Use structured streaming to stream into dataframes in real-time

  • Analyze streaming data over sliding windows of time

  • Maintain stateful information across streams of data

  • ConnectSparkStreaming with highly scalable sources of data, including Kafka, Flume, and Kinesis

  • Dump streams of data in real-time to NoSQL databases such as Cassandra

  • Run SQL queries on streamed data in real time

  • Train machine learning models in real time with streaming data, and use them to make predictions that keep getting better over time

  • Package,deploy, and run self-contained Spark Streaming code to a real Hadoop cluster using Amazon ElasticMapReduce.

This course is filled with achievableactivities and exercises to reinforce your learning. By the end of this course, you'll be confidently creating Spark Streaming scripts in Scala, and be prepared to tackle massive streams of data in a whole new way. You'll be surprised at how easy Spark Streaming makes it!


Taught by

Sundog Education by Frank Kane and Frank Kane

Related Courses

FinTech for Finance and Business Leaders
ACCA via edX
Accounting Data Analytics
University of Illinois at Urbana-Champaign via Coursera
Advanced AI on Microsoft Azure: Ethics and Laws, Research Methods and Machine Learning
Cloudswyft via FutureLearn
Ethics, Laws and Implementing an AI Solution on Microsoft Azure
Cloudswyft via FutureLearn
Post Graduate Certificate in Advanced Machine Learning & AI
Indian Institute of Technology Roorkee via Coursera