Open Problems in Mechanistic Interpretability: A Whirlwind Tour
Offered By: Google TechTalks via YouTube
Course Description
Overview
Embark on a comprehensive exploration of Mechanistic Interpretability in this 55-minute Google TechTalk presented by Neel Nanda. Delve into the fascinating field of reverse engineering learned algorithms in trained neural networks, with the goal of enhancing the safety and steerability of powerful systems. Gain insights into key works, promising areas of future research, and open problems in the field. Explore techniques in causal abstraction and meditation analysis, understand superposition and distributed representations, learn about model editing, and study individual circuits and neurons. Benefit from Neel's expertise as a member of the mechanistic interpretability team at Google DeepMind, drawing from his previous work with Chris Olah at Anthropic on the transformer circuits agenda and his independent research on reverse-engineering modular addition and understanding grokking.
Syllabus
Open Problems in Mechanistic Interpretability: A Whirlwind Tour
Taught by
Google TechTalks
Related Courses
Neural Networks for Machine LearningUniversity of Toronto via Coursera Good Brain, Bad Brain: Basics
University of Birmingham via FutureLearn Statistical Learning with R
Stanford University via edX Machine Learning 1—Supervised Learning
Brown University via Udacity Fundamentals of Neuroscience, Part 2: Neurons and Networks
Harvard University via edX