Speed Up Your Data Processing
Offered By: EuroPython Conference via YouTube
Course Description
Overview
Explore techniques to accelerate data processing in this 30-minute EuroPython 2020 conference talk. Learn about common bottlenecks in data science workflows and how to overcome them using parallel and asynchronous programming with Python's concurrent.futures module. Discover the differences between sequential and parallel processing, synchronous and asynchronous execution, and when to apply these concepts in network I/O operations and computation-driven workloads. Gain practical insights into implementing parallelism and asynchronous programming to optimize data processing pipelines, allowing more focus on extracting value from data. Through real-life analogies, understand concepts like Amdahl's Law, multiprocessing vs multithreading, and practical implementations using ThreadPoolExecutor and ProcessPoolExecutor. Suitable for data scientists, engineers, and anyone with basic Python knowledge interested in improving data processing efficiency.
Syllabus
Intro
A typical data science workflow
Data Processing in Python
Challenges with Data Processing
Task 1: Toast 100 slices of bread
Sequential Processing
Parallel Processing
Task 2: Brew coffee
Synchronous Execution
Practical Considerations
Amdahl's Law and Parallelism
Multiprocessing vs Multithreading
Initialize Submission List
Using ThreadPoolExecutor
Initialize Python modules
Initialize image resize process
Initialize File List in Directory
Using List Comprehensions
Using Process PoolExecutor
Taught by
EuroPython Conference
Related Courses
Coding the Matrix: Linear Algebra through Computer Science ApplicationsBrown University via Coursera كيف تفكر الآلات - مقدمة في تقنيات الحوسبة
King Fahd University of Petroleum and Minerals via Rwaq (رواق) Datascience et Analyse situationnelle : dans les coulisses du Big Data
IONIS via IONIS Data Lakes for Big Data
EdCast 統計学Ⅰ:データ分析の基礎 (ga014)
University of Tokyo via gacco