Frontera - Open Source Large-Scale Web Crawling Framework
Offered By: EuroPython Conference via YouTube
Course Description
Overview
Explore the open-source Frontera framework for large-scale web crawling in this EuroPython 2015 conference talk. Discover how to build real-time distributed web crawlers and website-focused ones using Frontera's customizable URL metadata storage, crawling strategies management, and transport layer abstraction. Learn about integrating Frontera with Scrapy, Kafka, and HBase to create a powerful distributed crawler. Gain insights into the framework's architecture, features, and use cases, including a demonstration of collecting statistics from the Spanish internet. Understand the motivation behind Frontera, its single-threaded and real-time capabilities, and future development plans. Perfect for developers interested in advanced web crawling techniques and large-scale data collection.
Syllabus
About me
What is Frontera
What is Terra
Motivation
Single threaded
Single integration
Real time
Unique content
Metadata storage
Architecture
Scrapping
Simple spider
Use cases
Architecture distributed
Features
Requirements
Quick start
Spanish crawl
Future plans
Questions
Taught by
EuroPython Conference
Related Courses
Advanced Operating SystemsGeorgia Institute of Technology via Udacity High Performance Computing
Georgia Institute of Technology via Udacity GT - Refresher - Advanced OS
Georgia Institute of Technology via Udacity Distributed Machine Learning with Apache Spark
University of California, Berkeley via edX CS125x: Advanced Distributed Machine Learning with Apache Spark
University of California, Berkeley via edX