YoVDO

Distributed Web Scraping in Python

Offered By: PyCon US via YouTube

Tags

PyCon US Courses Python Courses Queues Courses Message Broker Courses Code Management Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore distributed web scraping techniques in Python through this 24-minute PyCon US talk. Learn how to build a scalable and robust distributed web scraper to optimize large batch scraping jobs, reduce processing times, and enhance code durability. Discover the evolution from single requests to distributed systems, understand the advantages and disadvantages of distributed scraping, and gain insights into useful Python packages and considerations for implementation. Follow the speaker's journey through various iterations, addressing issues and implementing improvements along the way. Access accompanying slides for a comprehensive overview of the distributed web scraping process, from mental models to practical implementation using controllers, scraping nodes, queues, and message brokers.

Syllabus

Intro
Outline
Introduction
Data Science Project Stages
What is Distributed Web Scraping
Setting the Stage
Iteration - A Single Request
Looping Requests
Iteration 1 - Issues
Intermediate improvements
Iteration 2 - Issues
Distributed - Mental Model
Distributed - Controller
Distributed - Scraping Node
Distributed - Advantages
Distributed - Disadvantages
Distributed - Queues
Distributed - Message Brokers
Code Management
Useful Python Packages
Considerations
Conclusion


Taught by

PyCon US

Related Courses

Microsoft Excel VBA - Solving Complex Problems Using Basics
Udemy
Packaging Ruby Code with RubyGems
Pluralsight
Git: Branches, Merges, and Remotes
LinkedIn Learning
Git Essential Training: The Basics
LinkedIn Learning
SVN for Java Developers
LinkedIn Learning