YoVDO

PySpark Tutorial

Offered By: freeCodeCamp

Tags

PySpark Courses Python Courses Apache Spark Courses

Course Description

Overview

Dive into a comprehensive tutorial on PySpark, the Python interface for Apache Spark, designed for large-scale data processing and machine learning. Explore essential topics including PySpark introduction, working with DataFrames, handling missing values, groupby and aggregate functions, and MLlib implementation. Gain hands-on experience with Databricks and learn to implement Linear Regression using single clusters. Access accompanying code on GitHub and benefit from instructor Krish Naik's expertise throughout this 1-2 hour learning journey.

Syllabus

Pyspark Introduction.
Pyspark Dataframe Part 1.
Pyspark Handling Missing Values.
Pyspark Dataframe Part 2.
Pyspark Groupby And Aggregate Functions.
Pyspark Mlib And Installation And Implementation.
Introduction To Databricks.
Implementing Linear Regression using Databricks in Single Clusters.


Taught by

freeCodeCamp.org

Related Courses

CS115x: Advanced Apache Spark for Data Science and Data Engineering
University of California, Berkeley via edX
Big Data Analytics
University of Adelaide via edX
Big Data Essentials: HDFS, MapReduce and Spark RDD
Yandex via Coursera
Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames
Yandex via Coursera
Introduction to Apache Spark and AWS
University of London International Programmes via Coursera