Faster pandas
Offered By: LinkedIn Learning
Course Description
Overview
Learn how to make your pandas code quicker and more efficient. This course covers vectorization, common mistakes, pandas performance, saving memory, Numba, Cython, and more.
Syllabus
Introduction
- pandas and performance
- What you should know
- Working with the files on GitHub
- Why performance matters
- Setting goals
- Measuring performance
- Profiling
- Challenge: Identify bottleneck
- Solution: Identify bottleneck
- What is vectorization?
- Boolean indexing
- Understanding ufuncs
- Challenge: Selecting and manipulating data
- Solution: Selecting and manipulating data
- The limitations of appending
- The limitations of object dtype
- The limitations of row iteration
- Understanding the isin function
- Parsing time once
- Challenge: Query a DataFrame
- Solution: Query a DataFrame
- Using built-in functions
- Understanding eval and query
- Understanding the join function
- Challenge: Join and query
- Solution: Join and query
- Why memory is important?
- Measuring memory
- Loading parts of data
- Categorical data
- Challenge: Reducing memory
- Solution: Reducing memory
- Various formats and why not CSV
- Optimizing with SQL
- Optimizing with HDF5
- Challenge: Bike ride duration
- Solution: Bike ride duration
- What is Numba?
- Using Numba
- What's Cython?
- Writing Cython code
- Compiling Cython
- %%cython magic
- Challenge: Cython speedup
- Solution: Cython speedup
- Overview of alternative DataFrames
- Using Dask
- Using Vaex
- Challenge: Vaex vs. pandas
- Solution: Vaex vs. pandas
- Next steps
Taught by
Miki Tebeka
Related Courses
Faster Python CodeLinkedIn Learning