YoVDO

Small Big Data - Using NumPy and Pandas When Your Data Doesn't Fit in Memory

Offered By: PyCon US via YouTube

Tags

PyCon US Courses Python Courses pandas Courses NumPy Courses Data Processing Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Learn techniques for handling datasets too large for memory but too small for Big Data clusters in this 26-minute PyCon US talk. Discover how to process Small Big Data efficiently using NumPy and Pandas through money-saving strategies, compression techniques, batching methods, and indexing approaches. Explore practical solutions like utilizing Numpy dtypes, sparse arrays, and Pandas dtypes for compression, implementing chunking with Zarr and Pandas, and leveraging SQLite for indexing. Gain insights applicable to other libraries and specific data scenarios, empowering you to tackle data processing challenges effectively.

Syllabus

Small Big Data
Prelude: the most important question
TIME FOR A BIG DATA CLUSTER!!!!
A non-solution: don't use RAM, just disk
The software solution: use less RAM
Compression: Numpy dtypes
Compression: sparse arrays
Compression: Pandas dtypes When loading data you can specify types
Chunking: loading Numpy chunks with Zarr
Chunking: with Pandas
Indexing: the simplest solution
Indexing: Pandas without indexing
Indexing: populate SQLite from Pandas
Indexing: load from SQLite into DataFrame
Indexing: SQLite vs. CSV
Conclusion: what about other libraries?
Conclusion: don't forget about


Taught by

PyCon US

Related Courses

Computational Investing, Part I
Georgia Institute of Technology via Coursera
Введение в машинное обучение
Higher School of Economics via Coursera
Математика и Python для анализа данных
Moscow Institute of Physics and Technology via Coursera
Introduction to Python for Data Science
Microsoft via edX
Using Python for Research
Harvard University via edX