Small Big Data - Using NumPy and Pandas When Your Data Doesn't Fit in Memory

Offered By: PyCon US via YouTube

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Learn techniques for handling datasets too large for memory but too small for Big Data clusters in this 26-minute PyCon US talk. Discover how to process Small Big Data efficiently using NumPy and Pandas through money-saving strategies, compression techniques, batching methods, and indexing approaches. Explore practical solutions like utilizing Numpy dtypes, sparse arrays, and Pandas dtypes for compression, implementing chunking with Zarr and Pandas, and leveraging SQLite for indexing. Gain insights applicable to other libraries and specific data scenarios, empowering you to tackle data processing challenges effectively.

Syllabus

Small Big Data
Prelude: the most important question
TIME FOR A BIG DATA CLUSTER!!!!
A non-solution: don't use RAM, just disk
The software solution: use less RAM
Compression: Numpy dtypes
Compression: sparse arrays
Compression: Pandas dtypes When loading data you can specify types
Chunking: loading Numpy chunks with Zarr
Chunking: with Pandas
Indexing: the simplest solution
Indexing: Pandas without indexing
Indexing: populate SQLite from Pandas
Indexing: load from SQLite into DataFrame
Indexing: SQLite vs. CSV
Conclusion: what about other libraries?
Conclusion: don't forget about

Taught by

PyCon US

Small Big Data - Using NumPy and Pandas When Your Data Doesn't Fit in Memory

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Small Big Data - Using NumPy and Pandas When Your Data Doesn't Fit in Memory

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Login to Continue