Pandas 2, Dask or Polars? Quickly Tackling Larger Data on a Single Machine
Offered By: GAIA via YouTube
Course Description
Overview
Explore a comprehensive comparison of Pandas 2, Dask, and Polars for efficiently handling large datasets on a single machine in this informative 28-minute conference talk. Delve into the latest advancements in data processing tools, including Pandas 2's new Arrow data types, faster calculations, and improved scalability. Learn about Dask's ability to scale Pandas across cores and its recent "expressions" optimization. Discover Polars, a new competitor designed around Arrow with native multicore support. Gain insights into solving a "just about fits in RAM" data task using these three solutions, understanding their pros and cons to make informed decisions for research workflows. Examine whether Pandas operations still require 5x working RAM, the speed improvements in Pandas string operations, and the compatibility of Polars with tools like Scikit-learn and matplotlib. Presented by Ian Ozsvald, an experienced Chief Data Scientist and author, this talk offers valuable knowledge for data scientists and researchers looking to optimize their data processing techniques.
Syllabus
Pandas 2, Dask or Polars? Quickly Tackling Larger Data on a Single Machine by Ian Ozsvald
Taught by
GAIA
Related Courses
Machine Learning with RAPIDS - Accelerating Data Science WorkflowsNvidia via YouTube Streaming Featurization with Ibis, Substrait and Apache Arrow
Open Data Science via YouTube Sound Data Engineering in Rust - From Bits to DataFrames
Databricks via YouTube DataFusion and Apache Arrow: Supercharging Data Analytics with a Rust-Based Query Engine
Databricks via YouTube Cloud Fetch: High-Bandwidth Connectivity for BI Tools - Databricks
Databricks via YouTube