YoVDO

Creating Synthetic Datasets for Instruction Finetuning with LLaMA and Nemotron

Offered By: Mervin Praison via YouTube

Tags

Machine Learning Courses LLaMA (Large Language Model Meta AI) Courses Hugging Face Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Learn how to create synthetic datasets for instruction fine-tuning using LLaMA 3.1 and Nemotron 4 in this comprehensive tutorial video. Discover techniques for generating subtopics, creating questions, producing high-quality responses, and filtering content using AI models. Follow step-by-step instructions to set up the necessary tools, write and run Python scripts, and upload your custom dataset to Hugging Face. Gain insights into enhancing AI model performance with diverse training data and automating the dataset creation process. Perfect for AI developers and enthusiasts looking to optimize their models effectively.

Syllabus

Introduction and Overview
LLaMA 3.1 & Nemotron 4 Overview
Step 1: Generating Subtopics
Step 2: Creating Questions
Step 3: Generating Responses
Step 4: Filtering Responses with Reward Model
Uploading Dataset to Hugging Face
Final Thoughts and Next Steps


Taught by

Mervin Praison

Related Courses

LLaMA- Open and Efficient Foundation Language Models - Paper Explained
Yannic Kilcher via YouTube
Alpaca & LLaMA - Can it Compete with ChatGPT?
Venelin Valkov via YouTube
Experimenting with Alpaca & LLaMA
Aladdin Persson via YouTube
What's LLaMA? ChatLLaMA? - And Some ChatGPT/InstructGPT
Aladdin Persson via YouTube
Llama Index - Step by Step Introduction
echohive via YouTube