YoVDO

Building NumPy Arrays from CSV Files, Faster than Pandas

Offered By: PyCon US via YouTube

Tags

PyCon US Courses Python Courses NumPy Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a powerful new approach to converting CSV files into NumPy arrays in this 27-minute PyCon US talk. Dive into the development of delimited_to_arrays(), a C extension designed to outperform Pandas read_csv() while offering full configuration options of Python's csv.reader(), optional type discovery for columns, and support for all NumPy dtypes. Learn about the implementation's architecture, which efficiently collects Unicode code points per column, converts them to C-types, and writes them into NumPy arrays with minimal PyObject creation or reference counting. Discover how this solution, incorporated into the StaticFrame library, achieves significant performance advantages over Pandas across various DataFrame shapes and type heterogeneity. Gain insights into the background, design choices, and performance characteristics of this innovative implementation, which builds upon the 20-year tradition of extending csv.reader() to meet modern data processing needs.

Syllabus

Talks - Christopher Ariza: Building NumPy Arrays from CSV Files, Faster than Pandas


Taught by

PyCon US

Related Courses

Computational Investing, Part I
Georgia Institute of Technology via Coursera
Введение в машинное обучение
Higher School of Economics via Coursera
Математика и Python для анализа данных
Moscow Institute of Physics and Technology via Coursera
Introduction to Python for Data Science
Microsoft via edX
Using Python for Research
Harvard University via edX