Machine Learning with PySpark: Data Analysis using SQL
Offered By: Coursera Project Network via Coursera
Course Description
Overview
This Guided Project is for beginning Python Developers. In this 1-hour long project-based course, you will learn how to Describe PySpark and Machine Learning, Use PySpark to Capture data, Use PySpark SQL to observe the data, Use PySpark MLlib to prepare training data, and Use PySpark MLlib to predict an outcome. To achieve this, we will work through using PySpark to read data into a PySpark Dataframe, View the Data using PysPark SQL, Prepare the Test and Training data using a heart disease data set, and attempt to predict heart disease using independent variables.
Syllabus
- Project Overview
- This Guided Project is for beginning Python Developers. In this 1-hour long project-based course, you will learn how to Describe PySpark and Machine Learning, Use PySpark to capture data, Use PySpark SQL to observe the data, Use PySpark MLlib to prepare training data, and Use PySpark MLlib to predict an outcome. To achieve this, we will work through using PySpark to read data into a PySpark Dataframe, View the Data using PysPark SQL, Prepare the Test and Training data using a heart disease data set, and attempt to predict heart disease using independent variables.
Taught by
David Dalsveen
Related Courses
CS115x: Advanced Apache Spark for Data Science and Data EngineeringUniversity of California, Berkeley via edX Big Data Analytics
University of Adelaide via edX Big Data Essentials: HDFS, MapReduce and Spark RDD
Yandex via Coursera Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames
Yandex via Coursera Introduction to Apache Spark and AWS
University of London International Programmes via Coursera