YoVDO

LangChain Data Loaders, Tokenizers, Chunking, and Datasets - Data Prep

Offered By: James Briggs via YouTube

Tags

LangChain Courses Data Preparation Courses

Course Description

Overview

Explore essential data preparation techniques for Large Language Models in this comprehensive tutorial video. Learn how to effectively use LangChain data loaders, tokenize text with tiktoken tokenizers, implement chunking strategies using LangChain text splitters, and store data using Hugging Face datasets. Gain practical insights into preparing text for OpenAI embedding and completion models, with principles applicable to other LLMs like those from Hugging Face and Cohere. Follow along as the instructor demonstrates downloading LangChain documentation, utilizing document loaders, determining optimal text lengths for LLMs, and implementing recursive text splitting with chunk overlap. Discover the importance of proper data preparation and learn how to create, save, and load datasets using JSONL files.

Syllabus

Data preparation for LLMs
Downloading the LangChain docs
Using LangChain document loaders
How much text can we fit in LLMs?
Using tiktoken tokenizer to find length of text
Initializing the recursive text splitter in Langchain
Why we use chunk overlap
Chunking with RecursiveCharacterTextSplitter
Creating the dataset
Saving and loading with JSONL file
Data prep is important


Taught by

James Briggs

Related Courses

Prompt Templates for GPT-3.5 and Other LLMs - LangChain
James Briggs via YouTube
Getting Started with GPT-3 vs. Open Source LLMs - LangChain
James Briggs via YouTube
Chatbot Memory for Chat-GPT, Davinci + Other LLMs - LangChain
James Briggs via YouTube
Chat in LangChain
James Briggs via YouTube
Langchain Async Explained - Make Multiple OpenAI ChatGPT API Calls at the Same Time
echohive via YouTube