YoVDO

Building Makemore - MLP

Offered By: Andrej Karpathy via YouTube

Tags

Natural Language Processing (NLP) Courses Machine Learning Courses PyTorch Courses Overfitting Courses Model Training Courses Hyperparameters Courses Multilayer Perceptron Courses

Course Description

Overview

Dive into the implementation of a multilayer perceptron (MLP) character-level language model in this comprehensive video tutorial. Learn essential machine learning concepts including model training, learning rate tuning, hyperparameters, evaluation, train/dev/test splits, and under/overfitting. Follow along as the instructor builds a training dataset, implements embedding lookup tables and hidden layers, and explores the internals of PyTorch tensors. Discover how to implement output layers, negative log likelihood loss, and F.cross_entropy. Practice overfitting on a single batch before training on the full dataset with minibatches. Explore techniques for finding optimal learning rates and splitting datasets. Experiment with larger hidden layers and embedding sizes, visualize character embeddings, and learn to sample from the trained model. Access provided resources, including GitHub repositories, Jupyter notebooks, and relevant research papers to enhance your understanding and complete suggested exercises.

Syllabus

intro
Bengio et al. 2003 MLP language model paper walkthrough
re-building our training dataset
implementing the embedding lookup table
implementing the hidden layer + internals of torch.Tensor: storage, views
implementing the output layer
implementing the negative log likelihood loss
summary of the full network
introducing F.cross_entropy and why
implementing the training loop, overfitting one batch
training on the full dataset, minibatches
finding a good initial learning rate
splitting up the dataset into train/val/test splits and why
experiment: larger hidden layer
visualizing the character embeddings
experiment: larger embedding size
summary of our final code, conclusion
sampling from the model
google collab new!! notebook advertisement


Taught by

Andrej Karpathy

Related Courses

DP-100 Part 2 - Modeling
A Cloud Guru
Aerial Image Segmentation with PyTorch
Coursera Project Network via Coursera
AI Capstone Project with Deep Learning
IBM via Coursera
Applied Machine Learning
Johns Hopkins University via Coursera
Apply Generative Adversarial Networks (GANs)
DeepLearning.AI via Coursera