Common Corpus - Opening Data for Building Open Source LLMs

Offered By: Linux Foundation via YouTube

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Explore the groundbreaking "Common Corpus" project in this keynote address by Anastasia Stasenko, Co-founder of pleias and Associate Senior Lecturer at Sorbonne-Nouvelle. Delve into the challenges and opportunities surrounding the development of fully open source and reproducible Large Language Models (LLMs). Discover how the project aims to overcome the bottleneck of training data by establishing the largest collection of fully open data for LLM training, comprising 1T tokens. Learn about the legal issues surrounding copyrighted content in AI training, the importance of data quality for model performance, and how the Common Corpus project is empowering the open source AI community. Gain insights into the construction process of this massive corpus and its potential impact on advancing openness in generative AI.

Syllabus

Keynote: Common Corpus: Opening Data for Building Open Source LLMs - Anastasia Stasenko

Taught by

Linux Foundation

Common Corpus - Opening Data for Building Open Source LLMs

Tags

Course Description

Overview

Syllabus

Taught by

Tags

Related Courses

Common Corpus - Opening Data for Building Open Source LLMs

Tags

Course Description

Overview

Syllabus

Taught by

Tags

Related Courses

Login to Continue