YoVDO

Dimensionality Reduction in Python

Offered By: DataCamp

Tags

Python Courses Data Visualization Courses Feature Extraction Courses Dimensionality Reduction Courses Feature Selection Courses High-dimensional Data Courses t-SNE Courses Curse of Dimensionality Courses

Course Description

Overview

Understand the concept of reducing dimensionality in your data, and master the techniques to do so in Python.

High-dimensional datasets can be overwhelming and leave you not knowing where to start. Typically, you’d visually explore a new dataset first, but when you have too many dimensions the classical approaches will seem insufficient. Fortunately, there are visualization techniques designed specifically for high dimensional data and you’ll be introduced to these in this course. After exploring the data, you’ll often find that many features hold little information because they don’t show any variance or because they are duplicates of other features. You’ll learn how to detect these features and drop them from the dataset so that you can focus on the informative ones. In a next step, you might want to build a model on these features, and it may turn out that some don’t have any effect on the thing you’re trying to predict. You’ll learn how to detect and drop these irrelevant features too, in order to reduce dimensionality and thus complexity. Finally, you’ll learn how feature extraction techniques can reduce dimensionality for you through the calculation of uncorrelated principal components.

Syllabus

  • Exploring High Dimensional Data
    • You'll be introduced to the concept of dimensionality reduction and will learn when an why this is important. You'll learn the difference between feature selection and feature extraction and will apply both techniques for data exploration. The chapter ends with a lesson on t-SNE, a powerful feature extraction technique that will allow you to visualize a high-dimensional dataset.
  • Feature Selection I - Selecting for Feature Information
    • In this first out of two chapters on feature selection, you'll learn about the curse of dimensionality and how dimensionality reduction can help you overcome it. You'll be introduced to a number of techniques to detect and remove features that bring little added value to the dataset. Either because they have little variance, too many missing values, or because they are strongly correlated to other features.
  • Feature Selection II - Selecting for Model Accuracy
    • In this second chapter on feature selection, you'll learn how to let models help you find the most important features in a dataset for predicting a particular target feature. In the final lesson of this chapter, you'll combine the advice of multiple, different, models to decide on which features are worth keeping.
  • Feature Extraction
    • This chapter is a deep-dive on the most frequently used dimensionality reduction algorithm, Principal Component Analysis (PCA). You'll build intuition on how and why this algorithm is so powerful and will apply it both for data exploration and data pre-processing in a modeling pipeline. You'll end with a cool image compression use case.

Taught by

Jeroen Boeye and Aleksandra Vercauteren

Related Courses

Vector Databases Professional Certificate by Weaviate
LinkedIn Learning
Simple Parallel Coordinates Plot using d3 js
Coursera Project Network via Coursera
Data Analysis and Visualization
Georgia Institute of Technology via Udacity
Attacking Byzantine Robustness in High Dimensions
IEEE via YouTube
High-Dimension Perspective on Extracting & Encoding Information in Chemical Systems
Institute for Pure & Applied Mathematics (IPAM) via YouTube