Superposition in LLM Feature Representations
Offered By: Conf42 via YouTube
Course Description
Overview
Explore the concept of superposition in large language model feature representations in this 47-minute conference talk from Conf42 LLMs 2024. Delve into mechanistic interpretability, neural network representations, and the qualities of these representations. Examine decomposability and linearity in depth, including linear composition as a compression scheme and its demands. Investigate the linear representation puzzle and neuron-feature requirements before diving into the superposition hypothesis. Analyze sparsity and learn techniques for recovering features in superposition. Conclude with a discussion on feature exploration in large language models.
Syllabus
intro
preamble
mechanistic interpretability
neural network representations
qualities of representations
decomposability
linearity
linear composition as a compression scheme
demands of linearity
the linear representation puzzle
neuron - feature requirements
experience with llms
the superposition hypothesis
sparsity
recovering features in superposition
demands of linearity
feature exploration
thanks
Taught by
Conf42
Related Courses
Neural Networks for Machine LearningUniversity of Toronto via Coursera Good Brain, Bad Brain: Basics
University of Birmingham via FutureLearn Statistical Learning with R
Stanford University via edX Machine Learning 1—Supervised Learning
Brown University via Udacity Fundamentals of Neuroscience, Part 2: Neurons and Networks
Harvard University via edX