YoVDO

Apache PySpark by Example

Offered By: LinkedIn Learning

Tags

PySpark Courses Python Courses Apache Spark Courses

Course Description

Overview

Get up and running with Apache Spark quickly. This practical hands-on course shows Python users how to work with Apache PySpark to leverage the power of Spark for data science.

Syllabus

Introduction
  • Apache PySpark
  • What you should know
1. Introduction to Apache Spark
  • The Apache Spark ecosystem
  • Why Spark?
  • Spark origins and Databricks
  • Spark components
  • Partitions, transformations, lazy evaluations, and actions
2. Technical Setup
  • Set up the lab environment
  • Download a dataset
  • Importing
3. Working with the DataFrame API
  • The DataFrame API
  • Working with DataFrames
  • Schemas
  • Working with columns
  • Working with rows
  • Challenge
  • Solution
4. Functions
  • Built-in functions
  • Working with dates
  • User-defined functions
  • Working with joins
  • Challenge
  • Solution
5. Resilient Distributed Datasets (RDDs)
  • RDDs
  • Working with RDDs
Conclusion
  • Next steps

Taught by

Jonathan Fernandes

Related Courses

Fundamentals of Scalable Data Science
IBM via Coursera
Data Science and Engineering with Spark
Berkeley University of California via edX
Master of Machine Learning and Data Science
Imperial College London via Coursera
Data Analysis Using Pyspark
Coursera Project Network via Coursera
Building Machine Learning Pipelines in PySpark MLlib
Coursera Project Network via Coursera