The Spelled-Out Intro to Language Modeling - Building Makemore
Offered By: Andrej Karpathy via YouTube
Course Description
Overview
Syllabus
intro
reading and exploring the dataset
exploring the bigrams in the dataset
counting bigrams in a python dictionary
counting bigrams in a 2D torch tensor "training the model"
visualizing the bigram tensor
deleting spurious S and E tokens in favor of a single . token
sampling from the model
efficiency! vectorized normalization of the rows, tensor broadcasting
loss function the negative log likelihood of the data under our model
model smoothing with fake counts
PART 2: the neural network approach: intro
creating the bigram dataset for the neural net
feeding integers into neural nets? one-hot encodings
the "neural net": one linear layer of neurons implemented with matrix multiplication
transforming neural net outputs into probabilities: the softmax
summary, preview to next steps, reference to micrograd
vectorized loss
backward and update, in PyTorch
putting everything together
note 1: one-hot encoding really just selects a row of the next Linear layer's weight matrix
note 2: model smoothing as regularization loss
sampling from the neural net
conclusion
Taught by
Andrej Karpathy
Related Courses
Sequence ModelsDeepLearning.AI via Coursera Modern Natural Language Processing in Python
Udemy Stanford Seminar - Transformers in Language: The Development of GPT Models Including GPT-3
Stanford University via YouTube Long Form Question Answering in Haystack
James Briggs via YouTube Spotify's Podcast Search Explained
James Briggs via YouTube