Apache Spark Essential Training: Big Data Engineering
Offered By: LinkedIn Learning
Course Description
Overview
Learn how to make Apache Spark work with other Big Data technologies and put together an end-to-end project that can solve a real-world business problem.
Syllabus
Introduction
- Driving big data engineering with Apache Spark
- Course prerequisites
- Setting up the exercise files
- What is data engineering?
- Data engineering vs. data analytics vs. data science
- Data engineering functions
- Batch vs. real-time processing
- Data engineering with Spark
- Spark architecture review
- Parallel processing with Spark
- Spark execution plan
- Stateful stream processing
- Spark analytics and ML
- Batch processing use case: Problem statement
- Batch processing use case: Design
- Setting up the local DB
- Uploading stock to a central store
- Aggregating stock across warehouses
- Real-time use case: Problem
- Real-time use case: Design
- Generating a visits data stream
- Building a website analytics job
- Executing the real-time pipeline
- Batch vs. real-time options
- Scaling extraction and loading operations
- Scaling processing operations
- Building resiliency
- Project exercise requirements
- Solution design
- Extracting long last actions
- Building a scorecard
- More about Apache Spark
Taught by
Kumaran Ponnambalam
Related Courses
Cloud Computing Concepts: Part 2University of Illinois at Urbana-Champaign via Coursera Programming Reactive Systems
École Polytechnique Fédérale de Lausanne via edX Data Engineering on Google Cloud Platform en Français
Google Cloud via Coursera Architecting Stream Processing Solutions Using Google Cloud Pub/Sub
Pluralsight Developing Stream Processing Applications with AWS Kinesis
Pluralsight