The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing and Attribution in AI
Offered By: USC Information Sciences Institute via YouTube
Course Description
Overview
Explore the Data Provenance Initiative, a groundbreaking effort to audit and trace over 1800 text datasets used in AI training. Learn about the legal and ethical concerns surrounding dataset licensing and attribution in the AI industry. Discover the tools and standards developed to trace dataset lineage, from sources and creators to license conditions and subsequent use. Examine the landscape analysis revealing stark differences between commercially open and closed datasets, including their composition and focus areas. Gain insights from speakers Anthony Chen, an engineer at Google DeepMind, and Shayne Longpre, a PhD candidate at MIT, as they present their findings and discuss the implications for data transparency and understanding in AI development. Delve into the challenges of dataset monopolization in areas such as low-resource languages, creative tasks, and synthetic training data.
Syllabus
The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI
Taught by
USC Information Sciences Institute
Related Courses
Introduction to Artificial IntelligenceStanford University via Udacity Probabilistic Graphical Models 1: Representation
Stanford University via Coursera Artificial Intelligence for Robotics
Stanford University via Udacity Computer Vision: The Fundamentals
University of California, Berkeley via Coursera Learning from Data (Introductory Machine Learning course)
California Institute of Technology via Independent