YoVDO

The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing and Attribution in AI

Offered By: USC Information Sciences Institute via YouTube

Tags

Artificial Intelligence Courses Machine Learning Courses Open Data Courses Language Models Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the Data Provenance Initiative, a groundbreaking effort to audit and trace over 1800 text datasets used in AI training. Learn about the legal and ethical concerns surrounding dataset licensing and attribution in the AI industry. Discover the tools and standards developed to trace dataset lineage, from sources and creators to license conditions and subsequent use. Examine the landscape analysis revealing stark differences between commercially open and closed datasets, including their composition and focus areas. Gain insights from speakers Anthony Chen, an engineer at Google DeepMind, and Shayne Longpre, a PhD candidate at MIT, as they present their findings and discuss the implications for data transparency and understanding in AI development. Delve into the challenges of dataset monopolization in areas such as low-resource languages, creative tasks, and synthetic training data.

Syllabus

The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI


Taught by

USC Information Sciences Institute

Related Courses

Introduction to Artificial Intelligence
Stanford University via Udacity
Probabilistic Graphical Models 1: Representation
Stanford University via Coursera
Artificial Intelligence for Robotics
Stanford University via Udacity
Computer Vision: The Fundamentals
University of California, Berkeley via Coursera
Learning from Data (Introductory Machine Learning course)
California Institute of Technology via Independent