Data Ingestion with Python
Offered By: LinkedIn Learning
Course Description
Overview
Learn how to use Python tools and techniques to solve one of the main challenges data scientists face: getting good data to train their algorithms.
Syllabus
Introduction
- Why is data ingestion important?
- What you should know
- Using the exercise files
- Using the Coderpad quizzes
- Overview of data scientists work
- Where does data come from?
- Different types of data
- The data pipeline (ETL)
- Final destination (data lake)
- Working in CSV
- Working in XML
- Working in Parquet, Avro, and ORC
- Unstructured text
- JSON
- Solution: CSV to JSON
- Working with JSON
- Making HTTP calls
- Processing event-based data
- Solution: Location from IP
- Try to find an API
- Working with Beautiful Soup
- Working with Scrapy
- Working with Selenium
- Other considerations
- Solution: Get stock information from HTML
- What are schemas?
- Working with ontologies
- What should be in schema
- Schema changes
- Schema validations
- Types of databases
- Hosted and cost of ops
- Working with relational databases
- Working with key or value databases
- Working with document databases
- Working with graph databases
- Solution: ETL
- Data is never 100% okay
- Causes of errors
- Filling missing values
- Finding outliers (manual)
- Finding outliers (ML)
- Solution: Clean rides dataset
- Design your data
- KPIs
- What to monitor?
- Next steps
Taught by
Miki Tebeka
Related Courses
Datenmanagement mit SQLopenHPI Programming Cloud Services for Android Handheld Systems
Vanderbilt University via Coursera Getting and Cleaning Data
Johns Hopkins University via Coursera Ruby مدخل إلى برمجة مواقع الإنترنت باستخدام لغة
Rwaq (رواق) MongoDB for .NET Developers
MongoDB University