Apache PySpark by Example
Offered By: LinkedIn Learning
Course Description
Overview
Get up and running with Apache Spark quickly. This practical hands-on course shows Python users how to work with Apache PySpark to leverage the power of Spark for data science.
Syllabus
Introduction
- Apache PySpark
- What you should know
- The Apache Spark ecosystem
- Why Spark?
- Spark origins and Databricks
- Spark components
- Partitions, transformations, lazy evaluations, and actions
- Set up the lab environment
- Download a dataset
- Importing
- The DataFrame API
- Working with DataFrames
- Schemas
- Working with columns
- Working with rows
- Challenge
- Solution
- Built-in functions
- Working with dates
- User-defined functions
- Working with joins
- Challenge
- Solution
- RDDs
- Working with RDDs
- Next steps
Taught by
Jonathan Fernandes
Related Courses
CS115x: Advanced Apache Spark for Data Science and Data EngineeringUniversity of California, Berkeley via edX Big Data Analytics
University of Adelaide via edX Big Data Essentials: HDFS, MapReduce and Spark RDD
Yandex via Coursera Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames
Yandex via Coursera Introduction to Apache Spark and AWS
University of London International Programmes via Coursera