YoVDO

Grokking - Generalization Beyond Overfitting on Small Algorithmic Datasets

Offered By: Yannic Kilcher via YouTube

Tags

Generalization Courses Neural Networks Courses Overfitting Courses Latent Space Courses

Course Description

Overview

Explore the fascinating phenomenon of "grokking" in neural networks through this in-depth video explanation of a research paper. Delve into how neural networks can suddenly learn patterns in small algorithmic datasets, jumping from random chance to perfect generalization. Examine the emergence of underlying binary operations in learned latent spaces, and investigate factors influencing grokking. Gain insights into the role of smoothness, the importance of simple explanations, and the impact of weight decay on encouraging simplicity in neural network learning. Follow along with a comprehensive outline covering key topics such as double descent, binary operations datasets, and emerging structures in neural network learning.

Syllabus

- Intro & Overview
- The Grokking Phenomenon
- Related: Double Descent
- Binary Operations Datasets
- What quantities influence grokking?
- Learned Emerging Structure
- The role of smoothness
- Simple explanations win
- Why does weight decay encourage simplicity?
- Appendix
- Conclusion & Comments


Taught by

Yannic Kilcher

Related Courses

Practical Machine Learning
Johns Hopkins University via Coursera
Practical Deep Learning For Coders
fast.ai via Independent
機器學習基石下 (Machine Learning Foundations)---Algorithmic Foundations
National Taiwan University via Coursera
Data Analytics Foundations for Accountancy II
University of Illinois at Urbana-Champaign via Coursera
Entraînez un modèle prédictif linéaire
CentraleSupélec via OpenClassrooms