YoVDO

Data Ingestion with Python

Offered By: LinkedIn Learning

Tags

Python Courses Web Scraping Courses APIs Courses Database Management Courses Data Ingestion Courses

Course Description

Overview

Learn how to use Python tools and techniques to solve one of the main challenges data scientists face: getting good data to train their algorithms.

Syllabus

Introduction
  • Why is data ingestion important?
  • What you should know
  • Using the exercise files
  • Using the Coderpad quizzes
1. Data Ingestion Overview
  • Overview of data scientists work
  • Where does data come from?
  • Different types of data
  • The data pipeline (ETL)
  • Final destination (data lake)
2. Reading Files
  • Working in CSV
  • Working in XML
  • Working in Parquet, Avro, and ORC
  • Unstructured text
  • JSON
  • Solution: CSV to JSON
3. Calling APIs
  • Working with JSON
  • Making HTTP calls
  • Processing event-based data
  • Solution: Location from IP
4. Web Scraping
  • Try to find an API
  • Working with Beautiful Soup
  • Working with Scrapy
  • Working with Selenium
  • Other considerations
  • Solution: Get stock information from HTML
5. Schema
  • What are schemas?
  • Working with ontologies
  • What should be in schema
  • Schema changes
  • Schema validations
6. Working with Databases
  • Types of databases
  • Hosted and cost of ops
  • Working with relational databases
  • Working with key or value databases
  • Working with document databases
  • Working with graph databases
  • Solution: ETL
7. Troubleshooting Data
  • Data is never 100% okay
  • Causes of errors
  • Filling missing values
  • Finding outliers (manual)
  • Finding outliers (ML)
  • Solution: Clean rides dataset
8. Data KPIs and Process
  • Design your data
  • KPIs
  • What to monitor?
Conclusion
  • Next steps

Taught by

Miki Tebeka

Related Courses

Web Development
Udacity
Do-It-Yourself Geo Apps
Esri via Independent
Software Construction: Object-Oriented Design
The University of British Columbia via edX
Full-Text Search with SAP HANA Platform
SAP Learning
Tools for Data Science
IBM via Coursera