Continuous Data Pipeline for Real-Time Benchmarking and Data Set Augmentation
Offered By: Data Council via YouTube
Course Description
Overview
Explore a 15-minute conference talk from Data Council on building continuous data pipelines for real-time benchmarking and dataset augmentation. Learn how to generate datasets and implement real-time precision/recall splits to detect data shifts, prioritize data collection, and retrain models. Discover the importance of curating representative datasets for accurate ML systems and monitoring post-deployment metrics. Gain insights into addressing data shifts in unstructured language models and leveraging open-source APIs and annotation tools to streamline processes. Presented by Ivan Aguilar, a data scientist at Teleskope, this talk covers topics such as the problem statement, usual approaches, open-source data APIs, task overview, annotations overview, and final thoughts on improving ML model performance through effective data management strategies.
Syllabus
Intro
Why is this a problem?
Usual Approaches
Open Source Data API's
Task Overview
Annotations Overview
Final Thoughts
Taught by
Data Council
Related Courses
Introduction to Artificial IntelligenceStanford University via Udacity Natural Language Processing
Columbia University via Coursera Probabilistic Graphical Models 1: Representation
Stanford University via Coursera Computer Vision: The Fundamentals
University of California, Berkeley via Coursera Learning from Data (Introductory Machine Learning course)
California Institute of Technology via Independent