YoVDO

Apache Spark 3 Fundamentals

Offered By: Pluralsight

Tags

Apache Spark Courses Databricks Courses Data Processing Courses DataFrames Courses RDDs Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Learn the Fundamentals of Apache Spark 3: process data, set up the environment, use RDDs & DataFrames, optimize apps, build pipelines with Databricks and Azure Synapse. Familiarize yourself with Spark's ecosystem here in this course.

Apache Spark is one of the most widely used analytics engines. It performs distributed data processing and can handle petabytes of data. Spark can work with a variety of data formats, process data at high speeds, and support multiple use cases. Version 3 of Spark brings a whole new set of features and optimizations. In this course, Apache Spark 3 Fundamentals, you'll learn how Apache Spark can be used to process large volumes of data, whether batch or streaming data, and about the growing ecosystem of Spark. First, you'll learn what Apache Spark is, its architecture, and its execution model. You'll then see how to set up the Spark environment. Next, you'll learn about two Spark APIs – RDDs and DataFrames – and see how to use them to extract, analyze, clean, and transform batch data. Then, you'll learn various techniques to optimize your Spark applications, as well as the new optimization features of Apache Spark 3. After that, you'll see how to reliably store data in a Data Lake using the Delta Lake format and build streaming pipelines with Spark. Finally, you'll see how to use Spark in cloud services like Databricks and Azure Synapse Analytics. By the end of this course, you'll have the knowledge and skills to work with Apache Spark and use its capabilities and ecosystem to build large-scale data processing pipelines. So, let's get started!

Syllabus

  • Course Overview 1min
  • Getting Started with Apache Spark 30mins
  • Setting up Spark Environment 38mins
  • Working with RDDs - Resilient Distributed Datasets 53mins
  • Cleaning and Transforming Data with DataFrames 50mins
  • Working with Spark SQL, UDFs, and Common DataFrame Operations 34mins
  • Performing Optimizations in Spark 50mins
  • Features in Apache Spark 3 34mins
  • Building Reliable Data Lake with Spark and Delta Lake 47mins
  • Handling Streaming Data with Spark Structured Streaming 26mins
  • Working with Spark in Cloud 11mins

Taught by

Mohit Batra

Related Courses

Julia Scientific Programming
University of Cape Town via Coursera
Spark
Udacity
AI Workflow: Enterprise Model Deployment
IBM via Coursera
Apache Spark with Scala - Hands On with Big Data!
Udemy
Taming Big Data with Apache Spark and Python - Hands On!
Udemy