YoVDO

Downloading a Billion Files in Python

Offered By: EuroPython Conference via YouTube

Tags

EuroPython Courses Python Courses Concurrent Programming Courses Multiprocessing Courses Multithreading Courses Asyncio Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore efficient strategies for downloading a billion small files using Python in this EuroPython 2019 conference talk. Dive into three concurrent downloading mechanisms: multithreading, multiprocessing, and asyncio. Learn design best practices, debugging techniques, error handling, and performance comparisons for each approach. Gain insights into network latency, file size considerations, and API interactions. Examine code examples and performance metrics to understand the trade-offs between different methods. Discover how to optimize your workflow, handle pagination, and improve download speeds. Apply lessons learned to choose the most suitable library for large-scale file downloading tasks.

Syllabus

Introduction
The Task
Understanding the Task
Network Latency
File Size
The API
The Get API
Disclaimers
Synchronous
Multithreading
Coding
Main Loop
Performance
Why is this happening
Things to keep in mind
Multiprocessing
Multiprocessing code
Iterating over pages
Downloader
Speed Improvements
Async IO
List Call
Async IO Task
Different Libraries
UV Loop
Setup
IO HTTP
ItAll Files
Download Files
Summary
Multi Processing
Threading
Workflow
Interprocess communication overhead
Pagination token
Combo results
The real summary
Lessons learned
Thank you


Taught by

EuroPython Conference

Related Courses

Learn Advanced Python 3: Concurrency
Codecademy
Concurrent and Parallel Programming in Python
Packt via Coursera
Intermediate Python Programming Course
freeCodeCamp
Deploying PyTorch Models in Production: PyTorch Playbook
Pluralsight
High Performance Computer Architecture
Georgia Institute of Technology via Udacity