Faster pandas

Offered By: LinkedIn Learning

Course Description

Overview

Learn how to make your pandas code quicker and more efficient. This course covers vectorization, common mistakes, pandas performance, saving memory, Numba, Cython, and more.

Syllabus

Introduction

pandas and performance
What you should know
Working with the files on GitHub

1. Overview

Why performance matters
Setting goals
Measuring performance
Profiling
Challenge: Identify bottleneck
Solution: Identify bottleneck

2. Vectorization

What is vectorization?
Boolean indexing
Understanding ufuncs
Challenge: Selecting and manipulating data
Solution: Selecting and manipulating data

3. Common Mistakes

The limitations of appending
The limitations of object dtype
The limitations of row iteration
Understanding the isin function
Parsing time once
Challenge: Query a DataFrame
Solution: Query a DataFrame

4. pandas Performance

Using built-in functions
Understanding eval and query
Understanding the join function
Challenge: Join and query
Solution: Join and query

5. Saving Memory

Why memory is important?
Measuring memory
Loading parts of data
Categorical data
Challenge: Reducing memory
Solution: Reducing memory

6. Fast Serialization

Various formats and why not CSV
Optimizing with SQL
Optimizing with HDF5
Challenge: Bike ride duration
Solution: Bike ride duration

7. Numba and Cython

What is Numba?
Using Numba
What's Cython?
Writing Cython code
Compiling Cython
%%cython magic
Challenge: Cython speedup
Solution: Cython speedup

8. Alternative DataFrames

Overview of alternative DataFrames
Using Dask
Using Vaex
Challenge: Vaex vs. pandas
Solution: Vaex vs. pandas

Conclusion

Next steps

Taught by

Miki Tebeka

Faster pandas

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Faster pandas

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Login to Continue