YoVDO

Solving Real World Data Science Tasks With Python Beautiful Soup - Movie Dataset Creation

Offered By: Keith Galli via YouTube

Tags

Python Courses Data Science Courses Web Scraping Courses

Course Description

Overview

Learn how to solve real-world data science tasks using Python and Beautiful Soup in this comprehensive tutorial. Scrape Wikipedia pages to create a dataset on Disney movies while covering a wide range of Python and data science topics. Master web scraping with BeautifulSoup, clean data effectively, test code using Pytest, implement pattern matching with regular expressions, work with dates using the datetime library, save and load data with the Pickle library, and access data from APIs using the Requests library. Follow along with hands-on tasks, including scraping movie information, cleaning and processing data, and integrating external movie ratings. By the end of this tutorial, gain practical experience in creating a robust movie dataset from scratch using various Python libraries and data science techniques.

Syllabus

- Video overview
- Check out DataCamp! sponsored
- Setup
Task #1: Scrape the infobox from Toy Story 3 wiki page save in python dictionary
Task #2: Scrape infobox for all movies in List of Disney Films save as list of dictionaries
- Robots.txt Are you allowed to scrape a site?
- Task #2: Scrape infobox for all movies in List of Disney Films save as list of dictionaries
- Save & Load dataset checkpoint JSON file
Task #3: Clean our data!
- Task #3.1: Strip out all references [1],[2],etc from HTML
- Task #3.2: Split up the long strings
- Task #3.3: Examine errors we are getting
- Task #3.4: Convert “Running time” field to an integer
- Task #3.5: Convert “Budget” & “Box office” fields to floats
- Task #3.6: Convert dates into datetime objects
- Saving our data again using Pickle
Task #4: Attach IMDB, Metascore, and Rotten Tomatoes scores to dataset working with APIs
Task #5: Save final dataset as a JSON file and as a CSV file


Taught by

Keith Galli

Related Courses

Data Analysis
Johns Hopkins University via Coursera
Computing for Data Analysis
Johns Hopkins University via Coursera
Scientific Computing
University of Washington via Coursera
Introduction to Data Science
University of Washington via Coursera
Web Intelligence and Big Data
Indian Institute of Technology Delhi via Coursera