YoVDO

Building an LLM Fine-Tuning Dataset - From Reddit Comments to QLoRA Training

Offered By: sentdex via YouTube

Tags

Machine Learning Courses BigQuery Courses JSON Courses Data Preprocessing Courses Language Models Courses Fine-Tuning Courses QLoRA Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Learn how to build a QLoRA fine-tuning dataset for language models in this comprehensive video tutorial. Explore various Reddit dataset options, including torrent files, Archive.org, and BigQuery. Follow step-by-step instructions on exporting BigQuery Reddit data, decompressing gzip archives, and recombining archives for target subreddits. Discover the proper data structure, build training samples, and save them to a database. Create customized training JSON files and dive into QLoRA training and results. Gain valuable insights into dataset building for fine-tuning language models through practical demonstrations and explanations.

Syllabus

- Introduction to Dataset building for fine-tuning.
- The Reddit dataset options Torrent, Archive.org, BigQuery
- Exporting BigQuery Reddit and some other data
- Decompressing all of the gzip archives
- Re-combining the archives for target subreddits
- How to structure the data
- Building training samples and saving to database
- Creating customized training json files
- QLoRA training and results


Taught by

sentdex

Related Courses

Fine-Tuning LLM with QLoRA on Single GPU - Training Falcon-7b on ChatBot Support FAQ Dataset
Venelin Valkov via YouTube
Deploy LLM to Production on Single GPU - REST API for Falcon 7B with QLoRA on Inference Endpoints
Venelin Valkov via YouTube
Generative AI: Fine-Tuning LLM Models Crash Course
Krish Naik via YouTube
Aligning Open Language Models - Stanford CS25 Lecture
Stanford University via YouTube
Fine-Tuning LLM Models - Generative AI Course
freeCodeCamp