YoVDO

How Replit Trained Their Own LLMs - LLM Bootcamp

Offered By: The Full Stack via YouTube

Tags

LLM (Large Language Model) Courses Data Processing Courses Data Pipelines Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the comprehensive process of training custom Large Language Models (LLMs) in this 32-minute conference talk by Reza Shabani from Replit. Gain insights into the entire workflow, from data processing to deployment, including the modern LLM stack, data pipelines using Databricks and Hugging Face, preprocessing techniques, tokenizer training, and running training with MosaicML and Weights & Biases. Learn about testing and evaluation methods using HumanEval and Hugging Face, as well as deployment strategies involving FasterTransformer, Triton Server, and Kubernetes. Discover valuable lessons on data-centrism, evaluation, and collaboration, and understand the qualities that make an effective LLM engineer.

Syllabus

Why train your own LLMs?
The Modern LLM Stack
Data Pipelines: Databricks & Hugging Face
Preprocessing
Tokenizer Training
Running Training: MosaicML, Weights & Biases
Testing & Evaluation: HumanEval, Hugging Face
Deployment: FasterTransformer, Triton Server, k8s
Lessons learned: data-centrism, eval, and collaboration
What makes a good LLM engineer?


Taught by

The Full Stack

Related Courses

Coding the Matrix: Linear Algebra through Computer Science Applications
Brown University via Coursera
كيف تفكر الآلات - مقدمة في تقنيات الحوسبة
King Fahd University of Petroleum and Minerals via Rwaq (رواق)
Datascience et Analyse situationnelle : dans les coulisses du Big Data
IONIS via IONIS
Data Lakes for Big Data
EdCast
統計学Ⅰ:データ分析の基礎 (ga014)
University of Tokyo via gacco