Creating Synthetic Datasets for Instruction Finetuning with LLaMA and Nemotron
Offered By: Mervin Praison via YouTube
Course Description
Overview
Learn how to create synthetic datasets for instruction fine-tuning using LLaMA 3.1 and Nemotron 4 in this comprehensive tutorial video. Discover techniques for generating subtopics, creating questions, producing high-quality responses, and filtering content using AI models. Follow step-by-step instructions to set up the necessary tools, write and run Python scripts, and upload your custom dataset to Hugging Face. Gain insights into enhancing AI model performance with diverse training data and automating the dataset creation process. Perfect for AI developers and enthusiasts looking to optimize their models effectively.
Syllabus
Introduction and Overview
LLaMA 3.1 & Nemotron 4 Overview
Step 1: Generating Subtopics
Step 2: Creating Questions
Step 3: Generating Responses
Step 4: Filtering Responses with Reward Model
Uploading Dataset to Hugging Face
Final Thoughts and Next Steps
Taught by
Mervin Praison
Related Courses
Hugging Face on Azure - Partnership and Solutions AnnouncementMicrosoft via YouTube Question Answering in Azure AI - Custom and Prebuilt Solutions - Episode 49
Microsoft via YouTube Open Source Platforms for MLOps
Duke University via Coursera Masked Language Modelling - Retraining BERT with Hugging Face Trainer - Coding Tutorial
rupert ai via YouTube Masked Language Modelling with Hugging Face - Microsoft Sentence Completion - Coding Tutorial
rupert ai via YouTube