YoVDO

Data Preprocessing and Feature Engineering for Machine Learning

Offered By: Data Science Festival via YouTube

Tags

Data Preprocessing Courses Machine Learning Courses Python Courses Data Cleaning Courses Data Transformation Courses Feature Engineering Courses Outlier Detection Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore data preprocessing and feature engineering techniques for machine learning in this 40-minute conference talk from the Data Science Festival Summer School 2021. Delve into the crucial steps data scientists take to clean and prepare raw data for model training. Learn about various feature engineering processes, including handling missing values, encoding categorical variables, mathematical transformations, and creating new variables. Discover when and why to use specific techniques, their advantages, assumptions, and limitations, as well as their suitability for different algorithms. Compare implementations of these techniques in open-source Python libraries as presented by Soledad Galli, Lead Data Scientist at Train in Data. Gain insights into topics such as missing data imputation, categorical encoding, handling rare labels, distribution transformations, discretization, outlier treatment, feature combination, variable magnitude, scaling methods, and working with datetime variables, transactions, and time series data. Understand the importance of building efficient data preprocessing pipelines for machine learning projects.

Syllabus

Intro
Uses of Machine Learning
Machine Learning Models
Data Format and Quality
Challenges of Feature Engineering
Open-source for Feature engineering
Why Open-source
Missing Data Imputation Techniques
Categorical Variables
Categorical Encoding Techniques
Encoding Techniques: Rare labels
Distributions
Mathematical transformations
Discretisation
Outliers
Feature Combination
Variable Magnitude
Feature scaling methods
Datetime Variables
Transactions and Time Series
Pipeline


Taught by

Data Science Festival

Related Courses

Introduction to Artificial Intelligence
Stanford University via Udacity
Natural Language Processing
Columbia University via Coursera
Probabilistic Graphical Models 1: Representation
Stanford University via Coursera
Computer Vision: The Fundamentals
University of California, Berkeley via Coursera
Learning from Data (Introductory Machine Learning course)
California Institute of Technology via Independent