Feature Engineering with PySpark
Offered By: DataCamp
Course Description
Overview
Learn the gritty details that data scientists are spending 70-80% of their time on; data wrangling and feature engineering.
The real world is messy and your job is to make sense of it. Toy datasets like MTCars and Iris are the result of careful curation and cleaning, even so the data needs to be transformed for it to be useful for powerful machine learning algorithms to extract meaning, forecast, classify or cluster. This course will cover the gritty details that data scientists are spending 70-80% of their time on; data wrangling and feature engineering. With size of datasets now becoming ever larger, let's use PySpark to cut this Big Data problem down to size!
The real world is messy and your job is to make sense of it. Toy datasets like MTCars and Iris are the result of careful curation and cleaning, even so the data needs to be transformed for it to be useful for powerful machine learning algorithms to extract meaning, forecast, classify or cluster. This course will cover the gritty details that data scientists are spending 70-80% of their time on; data wrangling and feature engineering. With size of datasets now becoming ever larger, let's use PySpark to cut this Big Data problem down to size!
Syllabus
- Exploratory Data Analysis
- Get to know a bit about your problem before you dive in! Then learn how to statistically and visually inspect your dataset!
- Wrangling with Spark Functions
- Real data is rarely clean and ready for analysis. In this chapter learn to remove unneeded information, handle missing values and add additional data to your analysis.
- Feature Engineering
- In this chapter learn how to create new features for your machine learning model to learn from. We'll look at generating them by combining fields, extracting values from messy columns or encoding them for better results.
- Building a Model
- In this chapter we'll learn how to choose which type of model we want. Then we will learn how to apply our data to the model and evaluate it. Lastly, we'll learn how to interpret the results and save the model for later!
Taught by
John Hogue
Related Courses
Advanced Machine LearningThe Open University via FutureLearn Exploring and Analyzing Fifa's Datasets Using Python
Coursera Project Network via Coursera Applied Data Science for Data Analysts
Databricks via Coursera Automatic Machine Learning with H2O AutoML and Python
Coursera Project Network via Coursera Microsoft Future Ready: Using Python Programming to Explore the Principles of Machine Learning
Cloudswyft via FutureLearn