Data Management - Full Stack Deep Learning - March 2019
Offered By: The Full Stack via YouTube
Course Description
Overview
Explore data management essentials for deep learning projects in this comprehensive lecture. Delve into data labeling, storage, versioning, and processing techniques. Learn about the data flywheel concept, annotator training, labor sources, and service comparisons. Discover database scalability, data lake organization, and versioning strategies. Examine task dependencies and workflow management tools like Luigi and Airflow. Gain practical insights for implementing efficient data pipelines in machine learning projects.
Syllabus
Introduction
Data Flywheel: an initial manually labeled dataset enables self- improvement with user data
Roadmap
Training the annotators is crucial
Sources of Labor
Service Companies
Software
Data Storage
Database scalable storage and retrieval of structured data
Data Lake
What goes where
Data Versioning
Level 2
Motivational Example We have to train a photo popularity predictor every night.
Task Dependencies
Makefile limitations
Luigi and Airflow
Taught by
The Full Stack
Related Courses
How Google does Machine Learning 日本語版Google Cloud via Coursera How Google does Machine Learning em Português Brasileiro
Google Cloud via Coursera Машинное обучение на больших данных
Higher School of Economics via Coursera Practical Crowdsourcing for Efficient Machine Learning
Yandex via Coursera Introduction to Amazon SageMaker Ground Truth (Traditional Chinese)
Amazon Web Services via AWS Skill Builder