YoVDO

Common Corpus - Opening Data for Building Open Source LLMs

Offered By: Linux Foundation via YouTube

Tags

Generative AI Courses Machine Learning Courses AI Ethics Courses Open Data Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the groundbreaking "Common Corpus" project in this keynote address by Anastasia Stasenko, Co-founder of pleias and Associate Senior Lecturer at Sorbonne-Nouvelle. Delve into the challenges and opportunities surrounding the development of fully open source and reproducible Large Language Models (LLMs). Discover how the project aims to overcome the bottleneck of training data by establishing the largest collection of fully open data for LLM training, comprising 1T tokens. Learn about the legal issues surrounding copyrighted content in AI training, the importance of data quality for model performance, and how the Common Corpus project is empowering the open source AI community. Gain insights into the construction process of this massive corpus and its potential impact on advancing openness in generative AI.

Syllabus

Keynote: Common Corpus: Opening Data for Building Open Source LLMs - Anastasia Stasenko


Taught by

Linux Foundation

Tags

Related Courses

Building and Managing Superior Skills
State University of New York via Coursera
ChatGPT et IA : mode d'emploi pour managers et RH
CNAM via France Université Numerique
Digital Skills: Artificial Intelligence
Accenture via FutureLearn
AI Foundations for Everyone
IBM via Coursera
Design a Feminist Chatbot
Institute of Coding via FutureLearn