YoVDO

Data Preparation Tips and Tricks for Machine Learning

Offered By: Trelis Research via YouTube

Tags

Data Preparation Courses Machine Learning Courses Clustering Courses Fine-Tuning Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore data preparation techniques for machine learning in this comprehensive one-hour video tutorial. Learn about filtering and deduplication using FineWeb, balance concepts with hierarchical k-means filtering, and see a live demonstration of dataset balancing using OpenAssistant. Dive into topics like handling labeled data, setting chat templates for tokenizers, addressing hallucinations, and working with mixed-language datasets. Gain insights on text classification models, extracting structured data from PDFs, multi-GPU training, and implementing RAG pipelines. Access additional resources and a Colab notebook to enhance your understanding of data preparation strategies for optimal machine learning outcomes.

Syllabus

Welcome
Fine-web
Clustering and balancing data - Meta Paper
Clustering analysis in Colab
How to prepare chat / Q&A datasets synthetically
Q&A
Handling labeled data for fine-tuning
Setting a chat template for a tokenizer without one
Considerations on novel data and hallucinations
Issues with tokenizer and chat template not aligning
Using mixed-language datasets and their impact on training
Recommendations for models suitable for text classification
Extracting structured data from PDFs and tables
Multi-GPU training considerations
Using the LLM to VEC method for embeddings
Rag pipeline suggestions


Taught by

Trelis Research

Related Courses

TensorFlow: Working with NLP
LinkedIn Learning
Introduction to Video Editing - Video Editing Tutorials
Great Learning via YouTube
HuggingFace Crash Course - Sentiment Analysis, Model Hub, Fine Tuning
Python Engineer via YouTube
GPT3 and Finetuning the Core Objective Functions - A Deep Dive
David Shapiro ~ AI via YouTube
How to Build a Q&A AI in Python - Open-Domain Question-Answering
James Briggs via YouTube