YoVDO

Modern Data Science with Vaex - A New Approach to DataFrames and Pipelines

Offered By: EuroPython Conference via YouTube

Tags

EuroPython Courses Data Science Courses Machine Learning Courses DataFrames Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore modern data science techniques using Vaex, a powerful DataFrame library, in this 51-minute EuroPython Conference talk. Learn how to efficiently process large datasets on personal computers by leveraging computational graphs, lazy evaluations, memory-mapped storage, and out-of-core algorithms. Discover methods for cleaning, filtering, grouping, and transforming data while visualizing and analyzing correlations. Gain insights into handling datasets with millions or billions of samples without relying on distributed computing. Follow along as the speaker demonstrates practical examples using New York City taxi data, covering topics such as expressions, memory mapping, missing values, filtering, categorizing, group operations, density maps, machine learning, and virtual columns. Understand how Vaex optimizes memory and CPU usage, enabling data scientists to work effectively on laptops or workstations with limited RAM but fast SSD storage.

Syllabus

Introduction
Dataset options
Who is Jovan
Demo
Expressions
Data Science Example
Memory Map
Missing Values
Number of Passengers
Trip Distances
New York
New York City
Filter
Trip duration
Categorizing
Group by Standard
Density Maps
Machine Learning
Memory
PCA
PCA on a subsample
Payment type
String operations
Memory usage
Light GBM
Predict method
Wrappers
Virtual columns
Testing the notebook
Conclusion
Questions


Taught by

EuroPython Conference

Related Courses

Introduction to Artificial Intelligence
Stanford University via Udacity
Natural Language Processing
Columbia University via Coursera
Probabilistic Graphical Models 1: Representation
Stanford University via Coursera
Computer Vision: The Fundamentals
University of California, Berkeley via Coursera
Learning from Data (Introductory Machine Learning course)
California Institute of Technology via Independent