Common Corpus - Opening Data for Building Open Source LLMs
Offered By: Linux Foundation via YouTube
Course Description
Overview
Explore the groundbreaking "Common Corpus" project in this keynote address by Anastasia Stasenko, Co-founder of pleias and Associate Senior Lecturer at Sorbonne-Nouvelle. Delve into the challenges and opportunities surrounding the development of fully open source and reproducible Large Language Models (LLMs). Discover how the project aims to overcome the bottleneck of training data by establishing the largest collection of fully open data for LLM training, comprising 1T tokens. Learn about the legal issues surrounding copyrighted content in AI training, the importance of data quality for model performance, and how the Common Corpus project is empowering the open source AI community. Gain insights into the construction process of this massive corpus and its potential impact on advancing openness in generative AI.
Syllabus
Keynote: Common Corpus: Opening Data for Building Open Source LLMs - Anastasia Stasenko
Taught by
Linux Foundation
Tags
Related Courses
Building and Managing Superior SkillsState University of New York via Coursera ChatGPT et IA : mode d'emploi pour managers et RH
CNAM via France Université Numerique Digital Skills: Artificial Intelligence
Accenture via FutureLearn AI Foundations for Everyone
IBM via Coursera Design a Feminist Chatbot
Institute of Coding via FutureLearn