Building Makemore - Activations & Gradients, BatchNorm
Offered By: Andrej Karpathy via YouTube
Course Description
Overview
Dive deep into the internals of multi-layer perceptrons (MLPs) in this comprehensive video lecture. Explore the statistics of forward pass activations and backward pass gradients, while learning about potential pitfalls in improperly scaled networks. Discover essential diagnostic tools and visualizations for assessing the health of deep networks. Understand the challenges of training deep neural networks and learn about Batch Normalization, a key innovation that simplifies the process. Gain practical insights through code examples, real-world applications, and visualizations. Complete provided exercises to reinforce your understanding of weight initialization and BatchNorm implementation. Follow along as the lecture covers topics such as Kaiming initialization, PyTorch implementation, and various visualization techniques for network analysis.
Syllabus
intro
starter code
fixing the initial loss
fixing the saturated tanh
calculating the init scale: “Kaiming init”
batch normalization
batch normalization: summary
real example: resnet50 walkthrough
summary of the lecture
just kidding: part2: PyTorch-ifying the code
viz #1: forward pass activations statistics
viz #2: backward pass gradient statistics
the fully linear case of no non-linearities
viz #3: parameter activation and gradient statistics
viz #4: update:data ratio over time
bringing back batchnorm, looking at the visualizations
summary of the lecture for real this time
Taught by
Andrej Karpathy
Related Courses
Natural Language ProcessingColumbia University via Coursera Natural Language Processing
Stanford University via Coursera Introduction to Natural Language Processing
University of Michigan via Coursera moocTLH: Nuevos retos en las tecnologías del lenguaje humano
Universidad de Alicante via Miríadax Natural Language Processing
Indian Institute of Technology, Kharagpur via Swayam