Spark and Data Lakes
Offered By: Udacity
Course Description
Overview
In this course, you will learn about the big data ecosystem and how to use Spark to work with
massive datasets. You’ll also learn about how to store big data in a data lake and query it with Spark.
Syllabus
- Introduction to Spark and Data Lakes
- In this course you'll learn how Spark evaluates code and uses distributed computing to process and transform data. You'll work in the big data ecosystem to build data lakes and data lake houses.
- Big Data Ecosystem, Data Lakes, and Spark
- In this lesson, you will learn about the problems that Apache Spark is designed to solve. You'll also learn about the greater Big Data ecosystem and how Spark fits into it.
- Spark Essentials
- In this lesson, we'll dive into how to use Spark for wrangling, filtering, and transforming distributed data with PySpark and Spark SQL
- Using Spark in AWS
- In this lesson, you will learn to use Spark and work with data lakes with Amazon Web Services using S3, AWS Glue, and AWS Glue Studio.
- Ingesting and Organizing Data in a Lakehouse
- In this lesson you'll work with Lakehouse zones. You will build and configure these zones in AWS.
- STEDI Human Balance Analytics
- In this project, you'll work with sensor data that trains a machine learning model. You'll load S3 JSON data from a data lake into Athena tables using Spark and AWS Glue.
Taught by
Sean Murdock - Instructor
Related Courses
Réalisez des calculs distribués sur des données massivesCentraleSupélec via OpenClassrooms Data Management in the Cloud
Arizona State University via Coursera Programming with Cloud IoT Platforms
Pohang University of Science and Technology via Coursera AWS IoT: Developing and Deploying an Internet of Things
Amazon Web Services via edX AWS Computer Vision: Getting Started with GluonCV
Amazon Web Services via Coursera