Apache PySpark by Example
Offered By: LinkedIn Learning
Course Description
Overview
Get up and running with Apache Spark quickly. This practical hands-on course shows Python users how to work with Apache PySpark to leverage the power of Spark for data science.
Syllabus
Introduction
- Apache PySpark
- What you should know
- The Apache Spark ecosystem
- Why Spark?
- Spark origins and Databricks
- Spark components
- Partitions, transformations, lazy evaluations, and actions
- Set up the lab environment
- Download a dataset
- Importing
- The DataFrame API
- Working with DataFrames
- Schemas
- Working with columns
- Working with rows
- Challenge
- Solution
- Built-in functions
- Working with dates
- User-defined functions
- Working with joins
- Challenge
- Solution
- RDDs
- Working with RDDs
- Next steps
Taught by
Jonathan Fernandes
Related Courses
Design Computing: 3D Modeling in Rhinoceros with Python/RhinoscriptUniversity of Michigan via Coursera A Practical Introduction to Test-Driven Development
LearnQuest via Coursera FinTech for Finance and Business Leaders
ACCA via edX Access Bioinformatics Databases with Biopython
Coursera Project Network via Coursera Accounting Data Analytics
University of Illinois at Urbana-Champaign via Coursera