YoVDO

Cleaning Bad Data in R

Offered By: LinkedIn Learning

Tags

Data Cleaning Courses R Programming Courses Tidyverse Courses Data Integrity Courses Tidy Data Courses Data Formatting Courses

Course Description

Overview

Clean up your data in R. Learn how to identify and address data integrity issues such as missing and duplicate data, using R and the tidyverse.

Syllabus

Introduction
  • Data is messy
  • What you need to know
1. Missing Data
  • Types of missing data
  • Missing values
  • Missing rows
  • Aggregations and missing values
2. Duplicated Data
  • Duplicated rows and values
  • Aggregations in the data set
3. Formatting Data
  • Converting dates
  • Unit conversions
  • Numbers stored as text
  • Text improperly converted to numbers
  • Inconsistent spellings
4. Outliers
  • Screening for outliers
  • Handling outliers
  • Outliers use case
  • Outliers in subgroups
  • Detecting illogical values
5. Tidy Data
  • What is tidy data?
  • Variables, observations, and values
  • Common data problems
  • Wide vs. long data sets
  • Making wide data sets long
  • Making long data sets wide
6. Red Flags
  • Suspicious values
  • Suspicious multiples
Conclusion
  • What's next?

Taught by

Mike Chapple

Related Courses

Big Data
University of Adelaide via edX
Advanced Reproducibility in Cancer Informatics
Johns Hopkins University via Coursera
Advanced R Programming
Johns Hopkins University via Coursera
Advanced Statistics for Data Science
Johns Hopkins University via Coursera
Fundamentos de Ciencia de Datos con R
Universidad AnĂ¡huac via edX