YoVDO

Apache PySpark by Example

Offered By: LinkedIn Learning

Tags

PySpark Courses Python Courses Apache Spark Courses

Course Description

Overview

Get up and running with Apache Spark quickly. This practical hands-on course shows Python users how to work with Apache PySpark to leverage the power of Spark for data science.

Syllabus

Introduction
  • Apache PySpark
  • What you should know
1. Introduction to Apache Spark
  • The Apache Spark ecosystem
  • Why Spark?
  • Spark origins and Databricks
  • Spark components
  • Partitions, transformations, lazy evaluations, and actions
2. Technical Setup
  • Set up the lab environment
  • Download a dataset
  • Importing
3. Working with the DataFrame API
  • The DataFrame API
  • Working with DataFrames
  • Schemas
  • Working with columns
  • Working with rows
  • Challenge
  • Solution
4. Functions
  • Built-in functions
  • Working with dates
  • User-defined functions
  • Working with joins
  • Challenge
  • Solution
5. Resilient Distributed Datasets (RDDs)
  • RDDs
  • Working with RDDs
Conclusion
  • Next steps

Taught by

Jonathan Fernandes

Related Courses

CS115x: Advanced Apache Spark for Data Science and Data Engineering
University of California, Berkeley via edX
Big Data Analytics
University of Adelaide via edX
Big Data Essentials: HDFS, MapReduce and Spark RDD
Yandex via Coursera
Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames
Yandex via Coursera
Introduction to Apache Spark and AWS
University of London International Programmes via Coursera