Modern Data Science with Vaex - A New Approach to DataFrames and Pipelines
Offered By: EuroPython Conference via YouTube
Course Description
Overview
Explore modern data science techniques using Vaex, a powerful DataFrame library, in this 51-minute EuroPython Conference talk. Learn how to efficiently process large datasets on personal computers by leveraging computational graphs, lazy evaluations, memory-mapped storage, and out-of-core algorithms. Discover methods for cleaning, filtering, grouping, and transforming data while visualizing and analyzing correlations. Gain insights into handling datasets with millions or billions of samples without relying on distributed computing. Follow along as the speaker demonstrates practical examples using New York City taxi data, covering topics such as expressions, memory mapping, missing values, filtering, categorizing, group operations, density maps, machine learning, and virtual columns. Understand how Vaex optimizes memory and CPU usage, enabling data scientists to work effectively on laptops or workstations with limited RAM but fast SSD storage.
Syllabus
Introduction
Dataset options
Who is Jovan
Demo
Expressions
Data Science Example
Memory Map
Missing Values
Number of Passengers
Trip Distances
New York
New York City
Filter
Trip duration
Categorizing
Group by Standard
Density Maps
Machine Learning
Memory
PCA
PCA on a subsample
Payment type
String operations
Memory usage
Light GBM
Predict method
Wrappers
Virtual columns
Testing the notebook
Conclusion
Questions
Taught by
EuroPython Conference
Related Courses
Introduction to Artificial IntelligenceStanford University via Udacity Natural Language Processing
Columbia University via Coursera Probabilistic Graphical Models 1: Representation
Stanford University via Coursera Computer Vision: The Fundamentals
University of California, Berkeley via Coursera Learning from Data (Introductory Machine Learning course)
California Institute of Technology via Independent