Prepping Data for Analysis Using R
Offered By: Open Data Science via YouTube
Course Description
Overview
Explore data preparation techniques for analysis using R in this comprehensive conference talk from ODSC WEST 2015. Learn the fundamentals of data quality and how to automate routine steps in a principled manner. Discover common pitfalls in data preparation and how to detect and fix them through interactive demonstrations in the open-source R analysis environment. Download materials from the provided GitHub repository to follow along or practice later. Gain insights on handling faulty sensor situations, missing variables, novel categorical levels, and compact coding. Understand the importance of treatment plans, user interfaces, and operational issues in data preparation. Led by John Mount and Nina Zumel, experienced data scientists and authors, this talk covers essential topics such as linear regression, calibration, interpretation, and avoiding overfitting. Equip yourself with practical skills to improve your data science projects and increase their chances of success.
Syllabus
Intro
Workshop Outline
Workshop Agenda
Workshop Goals
Data Preparation
Faulty Sensor Situation
systematically missing variables
building missing variables
missing values
pragmatic solution
novel categorical levels
new data
Wyoming
Chemical categorical variables
Dealing with new levels
VTreat solution
Categorical variables
Compact coding
Indicator vs numerical variables
Treatment Plan
User Interface
Treatment Example
Linear Regression
Calibration
Interpretation
Operational Issues
Overfitting
Data fussing
John Mount
Taught by
Open Data Science
Related Courses
Practical Machine LearningJohns Hopkins University via Coursera Practical Deep Learning For Coders
fast.ai via Independent 機器學習基石下 (Machine Learning Foundations)---Algorithmic Foundations
National Taiwan University via Coursera Data Analytics Foundations for Accountancy II
University of Illinois at Urbana-Champaign via Coursera Entraînez un modèle prédictif linéaire
CentraleSupélec via OpenClassrooms