PySpark Tutorial
Offered By: freeCodeCamp
Course Description
Overview
Dive into a comprehensive tutorial on PySpark, the Python interface for Apache Spark, designed for large-scale data processing and machine learning. Explore essential topics including PySpark introduction, working with DataFrames, handling missing values, groupby and aggregate functions, and MLlib implementation. Gain hands-on experience with Databricks and learn to implement Linear Regression using single clusters. Access accompanying code on GitHub and benefit from instructor Krish Naik's expertise throughout this 1-2 hour learning journey.
Syllabus
Pyspark Introduction.
Pyspark Dataframe Part 1.
Pyspark Handling Missing Values.
Pyspark Dataframe Part 2.
Pyspark Groupby And Aggregate Functions.
Pyspark Mlib And Installation And Implementation.
Introduction To Databricks.
Implementing Linear Regression using Databricks in Single Clusters.
Taught by
freeCodeCamp.org
Related Courses
Analysing Unstructured Data using MongoDB and PySparkCoursera Project Network via Coursera Big Data, Hadoop, and Spark Basics
IBM via edX Cleaning and Exploring Big Data using PySpark
Coursera Project Network via Coursera Data Analysis Using Pyspark
Coursera Project Network via Coursera Data Science and Engineering with Spark
Berkeley University of California via edX