Faster pandas
Offered By: LinkedIn Learning
Course Description
Overview
Learn how to make your pandas code quicker and more efficient. This course covers vectorization, common mistakes, pandas performance, saving memory, Numba, Cython, and more.
Syllabus
Introduction
- pandas and performance
- What you should know
- Working with the files on GitHub
- Why performance matters
- Setting goals
- Measuring performance
- Profiling
- Challenge: Identify bottleneck
- Solution: Identify bottleneck
- What is vectorization?
- Boolean indexing
- Understanding ufuncs
- Challenge: Selecting and manipulating data
- Solution: Selecting and manipulating data
- The limitations of appending
- The limitations of object dtype
- The limitations of row iteration
- Understanding the isin function
- Parsing time once
- Challenge: Query a DataFrame
- Solution: Query a DataFrame
- Using built-in functions
- Understanding eval and query
- Understanding the join function
- Challenge: Join and query
- Solution: Join and query
- Why memory is important?
- Measuring memory
- Loading parts of data
- Categorical data
- Challenge: Reducing memory
- Solution: Reducing memory
- Various formats and why not CSV
- Optimizing with SQL
- Optimizing with HDF5
- Challenge: Bike ride duration
- Solution: Bike ride duration
- What is Numba?
- Using Numba
- What's Cython?
- Writing Cython code
- Compiling Cython
- %%cython magic
- Challenge: Cython speedup
- Solution: Cython speedup
- Overview of alternative DataFrames
- Using Dask
- Using Vaex
- Challenge: Vaex vs. pandas
- Solution: Vaex vs. pandas
- Next steps
Taught by
Miki Tebeka
Related Courses
Computational Investing, Part IGeorgia Institute of Technology via Coursera Введение в машинное обучение
Higher School of Economics via Coursera Математика и Python для анализа данных
Moscow Institute of Physics and Technology via Coursera Introduction to Python for Data Science
Microsoft via edX Python for Data Science
University of California, San Diego via edX