YoVDO

Prepping Data for Analysis Using R

Offered By: Open Data Science via YouTube

Tags

R Programming Courses Data Analysis Courses Linear Regression Courses RStudio Courses Data Preparation Courses Overfitting Courses Categorical Variables Courses

Course Description

Overview

Explore data preparation techniques for analysis using R in this comprehensive conference talk from ODSC WEST 2015. Learn the fundamentals of data quality and how to automate routine steps in a principled manner. Discover common pitfalls in data preparation and how to detect and fix them through interactive demonstrations in the open-source R analysis environment. Download materials from the provided GitHub repository to follow along or practice later. Gain insights on handling faulty sensor situations, missing variables, novel categorical levels, and compact coding. Understand the importance of treatment plans, user interfaces, and operational issues in data preparation. Led by John Mount and Nina Zumel, experienced data scientists and authors, this talk covers essential topics such as linear regression, calibration, interpretation, and avoiding overfitting. Equip yourself with practical skills to improve your data science projects and increase their chances of success.

Syllabus

Intro
Workshop Outline
Workshop Agenda
Workshop Goals
Data Preparation
Faulty Sensor Situation
systematically missing variables
building missing variables
missing values
pragmatic solution
novel categorical levels
new data
Wyoming
Chemical categorical variables
Dealing with new levels
VTreat solution
Categorical variables
Compact coding
Indicator vs numerical variables
Treatment Plan
User Interface
Treatment Example
Linear Regression
Calibration
Interpretation
Operational Issues
Overfitting
Data fussing
John Mount


Taught by

Open Data Science

Related Courses

Statistics: Making Sense of Data
University of Toronto via Coursera
Curso Práctico de Bioestadística con R
Universidad San Pablo CEU via Miríadax
Statistical Learning with R
Stanford University via edX
The Analytics Edge
Massachusetts Institute of Technology via edX
Regression Models
Johns Hopkins University via Coursera