Comparing the Different Ways to Scale Python and Pandas Code
Offered By: PyCon US via YouTube
Course Description
Overview
Explore the different approaches to scaling Python and Pandas code in this PyCon US talk. Learn about Fugue, an open-source unified interface for Pandas, Spark, and Dask that enables scale-agnostic compute workflows. Discover how to decouple logic and execution, allowing you to code in familiar languages like Python, Pandas, or SQL, and choose your preferred execution engine. Dive into the transform() function, which facilitates distributed execution of single functions. Understand Pandas limitations, distributed computing frameworks, and how Fugue reduces the barrier to entry for distributed computing. Compare eager and lazy evaluation, examine expectations versus reality in data processing, and explore Spark solutions using traditional SQL syntax. Gain insights into leveraging Python and SQL for efficient code size and execution time in large-scale data processing tasks.
Syllabus
Intro
Pandas Limitations
How To Scale Out?
Distributed Computing Frameworks
Reducing Barrier to Entry
Introduction to Fugue
Fugue Transform
Bringing it to Spark
The DataFrame For Tests
Pandas Assumes Data Is Physically Together
Pandas Assumes Data Shuffle is Cheap
Pandas Assumes Eager Evaluation
Eager vs Lazy Evaluation
Expectation vs Reality
A Spark Solution Based On Traditional SOL Syntax
Fugue SQL
Leveraging Python
SQL Code Size & Execution Time
Taught by
PyCon US
Related Courses
Cloud Computing Concepts, Part 1University of Illinois at Urbana-Champaign via Coursera Cloud Computing Concepts: Part 2
University of Illinois at Urbana-Champaign via Coursera Reliable Distributed Algorithms - Part 1
KTH Royal Institute of Technology via edX Introduction to Apache Spark and AWS
University of London International Programmes via Coursera Réalisez des calculs distribués sur des données massives
CentraleSupélec via OpenClassrooms