Data Science Foundations: Data Assessment for Predictive Modeling

Offered By: LinkedIn Learning

Course Description

Overview

Explore the data understanding phase of the CRISP-DM methodology for predictive modeling. Find out how to collect, describe, explore, and verify data.

Syllabus

Introduction

Why data assessment is critical
A note about the exercise files

1. What Is Data Assessment?

Clarifying how data understanding differs from data visualization
Introducing the critical data understanding phase of CRISP-DM
Data assessment in CRISP-DM alternatives: The IBM ASUM-DM and Microsoft TDSP
Navigating the transition from business understanding to data understanding
How to organize your work with the four data understanding tasks

2. Collect Initial Data

Considerations in gathering the relevant data
A strategy for processing data sources
Getting creative about data sources
How to envision a proper flat file
Anticipating data integration

3. First Look at the Data

Reviewing basic concepts in the level of measurement
What is dummy coding?
Expanding our definition of level of measurement
Taking an initial look at possible key variables
Dealing with duplicate IDs and transactional data
How many potential variables (columns) will I have?
How to deal with high-order multiple nominals
Challenge: Identifying the level of measurement
Solution: Identifying the level of measurement

4. Data Loading and Unit of Analysis

Introducing the KNIME Analytics Platform
Tips and tricks to consider during data loading
Unit analysis decisions
Challenge: What should the row be?
Solution: What should the row be?

5. Describe Data

How to uncover the gross properties of the data
Researching the dataset
Tips and tricks using simple aggregation commands
A simple strategy for organizing your work

6. Data Description Case Studies

Describe data demo using the UCI heart dataset
Challenge: Practice describe data with the UCI heart dataset
Solution: Practice describe data with the UCI heart dataset

7. Explore Data Basics

The explore data task
How to be effective doing univariate analysis and data visualization
Anscombe's quartet
The Data Explorer node feature in KNIME
How to navigate borderline cases of variable type
How to be effective in doing bivariate data visualization
Challenge: Producing bivariate visualizations for case study 1
Solution: Producing bivariate visualizations for case study 1

8. Explore Data Tips and Tricks

How to utilize an SME's time effectively
Techniques for working with the top predictors
Advice for weak predictors
Tips and tricks when searching for quirks in your data
Learning when to discard rows
Introducing ggplot2
Orientating to R's ggplot2 for powerful multivariate data visualizations
Challenge: Producing multivariate visualizations for case study 1
Solution: Producing multivariate visualizations for case study 1

9. Verify Data Quality

Exploring your missing data options
Why you lose rows to listwise deletion
Investigating the provenance of the missing data

10. Missing Data Case Study

Introducing the KDD Cup 1998 data
What is the pattern of missing data in your data?
Is the missing data worth saving?
Assessing imputation as a potential solution

11. Explore and Verify Case Studies

Exploring and verifying data quality with the UCI heart dataset
Challenge: Quantifying missing data with the UCI heart dataset
Solution: Quantifying missing data with the UCI heart dataset

12. Making the Transition to Data Preparation

Why formal reports are important
Creating a data prep to-do list
How to prepare for eventual deployment

Conclusion

Next steps

Taught by

Keith McCormick

Data Science Foundations: Data Assessment for Predictive Modeling

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Data Science Foundations: Data Assessment for Predictive Modeling

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Login to Continue