YoVDO

Web Scraping in R

Offered By: DataCamp

Tags

R Programming Courses Computer Science Courses Web Scraping Courses XPath Courses HTML Courses HTTP Requests Courses

Course Description

Overview

Learn how to efficiently collect and download data from any website using R.

Have you ever come across a website that displays a lot of data such as statistics, product reviews, or prices in a format that’s not data analysis-ready? Often, authorities and other data providers publish their data in neatly formatted tables. However, not all of these sites include a download button, but don’t despair. In this course, you’ll learn how to efficiently collect and download data from any website using R. You'll learn how to automate the scraping and parsing of Wikipedia using the rvest and httr packages. Through hands-on exercises, you’ll also expand your understanding of HTML and CSS, the building blocks of web pages, as you make your data harvesting workflows less error-prone and more efficient.

Syllabus

  • Introduction to HTML and Web Scraping
    • In this chapter, you'll be introduced to Hyper Text Markup Language (HTML), a declarative language used to structure modern websites. Using the rvest library, you'll learn how to query simple HTML elements and scrape your first table.
  • Navigation and Selection with CSS
    • Cascading Style Sheets (CSS) describe how HTML elements are displayed on a web page, including colors, fonts, and general layout. In this chapter, you'll learn why CSS selectors and combinators are a crucial ingredient for web scraping.
  • Advanced Selection with XPATH
    • The CSS selectors you got to know in the last chapter are powerful but have their limitations. For example, if you want to select nodes based on the properties of their descendants. XPath to the rescue! Using this query language, you can navigate and scrape even the most hideous HTML.
  • Scraping Best Practices
    • Now that you know how to extract content from web pages, it's time to look behind the curtains. In this final chapter, you’ll learn why HTTP requests are the foundation of every scraping action and how they can be customized to comply with best practices in web scraping.

Taught by

Timo Grossenbacher

Related Courses

Web Development
Udacity
Programming Languages
University of Virginia via Udacity
Building a Basic Website
University of Massachusetts Amherst via Independent
Web-Technologien
openHPI
iDESWEB, Introducción al desarrollo web
Miríadax