Cleaning and Preparing Data Course
Offered By: Treehouse
Course Description
Overview
We rely on data to answer important questions, whether we are trying to make the best business decisions or determine the effectiveness of a new medical treatment. But our analyses are only as accurate as the data we are using, and incorrect or “dirty” data can lead to incorrect conclusions and assumptions. Data preparation, also called “cleaning” or “scrubbing”, is an important part of ensuring our analyses are accurate and useful.
What you'll learn
- Cleaning and scrubbing data
- Potential problems within datasets
- Understanding your dataset
- Handling bad data
Syllabus
“Clean” and “Dirty” Data
Welcome! In this stage, you will learn about why having a properly cleaned dataset is important and some of the problems you may encounter when cleaning a dataset. we will also take our first look at the data we will be using throughout this course.
Chevron 6 steps-
What is Data Cleaning?
3:51
-
Types of Bad Data
5:18
-
Data Preparation Basics
7 questions
-
Understanding Your Dataset
2:15
-
Exploring Your Dataset
7:15
-
Understanding Your Dataset
7 questions
Handling Bad Data
Now that we know a little bit about our dataset and the data cleaning process, we will take a closer look at some common issues using our example dataset. Sometimes these issues can be fixed, while other times it’s best to remove the data from our analyses. We can even write programs to help us automate some of the data preparation process, saving time and effort.
Chevron 10 steps-
Simple Data Issues
8:37
-
Sensible Column Names and Values
6:12
-
Fixing or Excluding Data
3:39
-
Simple Fixes and Exclusions Review
11 questions
-
Missing Data
12:31
-
Fixes and Exclusions for Complex Issues
5 questions
-
Duplicated Data
9:02
-
Infeasible and Extreme Data
8:51
-
Automating Data Preparation
8:08
-
Automating Data Preparation
5 questions
Selecting Relevant Data
While it may seem like more data is always better, usually we only want to look at the information that’s relevant to the question we are trying to answer. In this stage, we will look at different ways of choosing the most applicable data.
Chevron 6 steps-
Making Your Dataset Smaller
2:11
-
Choosing the Right Features
8:45
-
Selecting the Right Data
6 questions
-
Automated Feature Selection
5:55
-
Cleaning and Preparing Data
1:31
-
Automating Feature Selection
5 questions
Taught by
Alyssa Batula
Related Courses
Passion Driven StatisticsWesleyan University via Coursera Machine Learning With Big Data
University of California, San Diego via Coursera Big Data - Capstone Project
University of California, San Diego via Coursera Data Science at Scale - Capstone Project
University of Washington via Coursera Анализ данных: финальный проект
Moscow Institute of Physics and Technology via Coursera