YoVDO

Data Science Foundations: Data Assessment for Predictive Modeling

Offered By: LinkedIn Learning

Tags

CRISP-DM Courses Data Science Courses Data Visualization Courses Data Collection Courses Predictive Modeling Courses Data Exploration Courses

Course Description

Overview

Explore the data understanding phase of the CRISP-DM methodology for predictive modeling. Find out how to collect, describe, explore, and verify data.

Syllabus

Introduction
  • Why data assessment is critical
  • A note about the exercise files
1. What Is Data Assessment?
  • Clarifying how data understanding differs from data visualization
  • Introducing the critical data understanding phase of CRISP-DM
  • Data assessment in CRISP-DM alternatives: The IBM ASUM-DM and Microsoft TDSP
  • Navigating the transition from business understanding to data understanding
  • How to organize your work with the four data understanding tasks
2. Collect Initial Data
  • Considerations in gathering the relevant data
  • A strategy for processing data sources
  • Getting creative about data sources
  • How to envision a proper flat file
  • Anticipating data integration
3. First Look at the Data
  • Reviewing basic concepts in the level of measurement
  • What is dummy coding?
  • Expanding our definition of level of measurement
  • Taking an initial look at possible key variables
  • Dealing with duplicate IDs and transactional data
  • How many potential variables (columns) will I have?
  • How to deal with high-order multiple nominals
  • Challenge: Identifying the level of measurement
  • Solution: Identifying the level of measurement
4. Data Loading and Unit of Analysis
  • Introducing the KNIME Analytics Platform
  • Tips and tricks to consider during data loading
  • Unit analysis decisions
  • Challenge: What should the row be?
  • Solution: What should the row be?
5. Describe Data
  • How to uncover the gross properties of the data
  • Researching the dataset
  • Tips and tricks using simple aggregation commands
  • A simple strategy for organizing your work
6. Data Description Case Studies
  • Describe data demo using the UCI heart dataset
  • Challenge: Practice describe data with the UCI heart dataset
  • Solution: Practice describe data with the UCI heart dataset
7. Explore Data Basics
  • The explore data task
  • How to be effective doing univariate analysis and data visualization
  • Anscombe's quartet
  • The Data Explorer node feature in KNIME
  • How to navigate borderline cases of variable type
  • How to be effective in doing bivariate data visualization
  • Challenge: Producing bivariate visualizations for case study 1
  • Solution: Producing bivariate visualizations for case study 1
8. Explore Data Tips and Tricks
  • How to utilize an SME's time effectively
  • Techniques for working with the top predictors
  • Advice for weak predictors
  • Tips and tricks when searching for quirks in your data
  • Learning when to discard rows
  • Introducing ggplot2
  • Orientating to R's ggplot2 for powerful multivariate data visualizations
  • Challenge: Producing multivariate visualizations for case study 1
  • Solution: Producing multivariate visualizations for case study 1
9. Verify Data Quality
  • Exploring your missing data options
  • Why you lose rows to listwise deletion
  • Investigating the provenance of the missing data
10. Missing Data Case Study
  • Introducing the KDD Cup 1998 data
  • What is the pattern of missing data in your data?
  • Is the missing data worth saving?
  • Assessing imputation as a potential solution
11. Explore and Verify Case Studies
  • Exploring and verifying data quality with the UCI heart dataset
  • Challenge: Quantifying missing data with the UCI heart dataset
  • Solution: Quantifying missing data with the UCI heart dataset
12. Making the Transition to Data Preparation
  • Why formal reports are important
  • Creating a data prep to-do list
  • How to prepare for eventual deployment
Conclusion
  • Next steps

Taught by

Keith McCormick

Related Courses

Lean Data Approaches to Measure Social Impact
Acumen Academy
Advanced Manufacturing Process Analysis
University at Buffalo via Coursera
Artificial Intelligence Data Fairness and Bias
LearnQuest via Coursera
AI in Healthcare Capstone
Stanford University via Coursera
Google Data Analytics (PT)
Google via Coursera