YoVDO

Comparing the Different Ways to Scale Python and Pandas Code

Offered By: PyCon US via YouTube

Tags

PyCon US Courses Python Courses SQL Courses pandas Courses Data Transformation Courses Distributed Computing Courses Dask Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the different approaches to scaling Python and Pandas code in this PyCon US talk. Learn about Fugue, an open-source unified interface for Pandas, Spark, and Dask that enables scale-agnostic compute workflows. Discover how to decouple logic and execution, allowing you to code in familiar languages like Python, Pandas, or SQL, and choose your preferred execution engine. Dive into the transform() function, which facilitates distributed execution of single functions. Understand Pandas limitations, distributed computing frameworks, and how Fugue reduces the barrier to entry for distributed computing. Compare eager and lazy evaluation, examine expectations versus reality in data processing, and explore Spark solutions using traditional SQL syntax. Gain insights into leveraging Python and SQL for efficient code size and execution time in large-scale data processing tasks.

Syllabus

Intro
Pandas Limitations
How To Scale Out?
Distributed Computing Frameworks
Reducing Barrier to Entry
Introduction to Fugue
Fugue Transform
Bringing it to Spark
The DataFrame For Tests
Pandas Assumes Data Is Physically Together
Pandas Assumes Data Shuffle is Cheap
Pandas Assumes Eager Evaluation
Eager vs Lazy Evaluation
Expectation vs Reality
A Spark Solution Based On Traditional SOL Syntax
Fugue SQL
Leveraging Python
SQL Code Size & Execution Time


Taught by

PyCon US

Related Courses

Cloud Computing Concepts, Part 1
University of Illinois at Urbana-Champaign via Coursera
Cloud Computing Concepts: Part 2
University of Illinois at Urbana-Champaign via Coursera
Reliable Distributed Algorithms - Part 1
KTH Royal Institute of Technology via edX
Introduction to Apache Spark and AWS
University of London International Programmes via Coursera
Réalisez des calculs distribués sur des données massives
CentraleSupélec via OpenClassrooms