YoVDO

Open Problems in Mechanistic Interpretability: A Whirlwind Tour

Offered By: Google TechTalks via YouTube

Tags

Neural Networks Courses Circuits Courses Reverse Engineering Courses Superposition Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Embark on a comprehensive exploration of Mechanistic Interpretability in this 55-minute Google TechTalk presented by Neel Nanda. Delve into the fascinating field of reverse engineering learned algorithms in trained neural networks, with the goal of enhancing the safety and steerability of powerful systems. Gain insights into key works, promising areas of future research, and open problems in the field. Explore techniques in causal abstraction and meditation analysis, understand superposition and distributed representations, learn about model editing, and study individual circuits and neurons. Benefit from Neel's expertise as a member of the mechanistic interpretability team at Google DeepMind, drawing from his previous work with Chris Olah at Anthropic on the transformer circuits agenda and his independent research on reverse-engineering modular addition and understanding grokking.

Syllabus

Open Problems in Mechanistic Interpretability: A Whirlwind Tour


Taught by

Google TechTalks

Related Courses

Neural Networks for Machine Learning
University of Toronto via Coursera
Good Brain, Bad Brain: Basics
University of Birmingham via FutureLearn
Statistical Learning with R
Stanford University via edX
Machine Learning 1—Supervised Learning
Brown University via Udacity
Fundamentals of Neuroscience, Part 2: Neurons and Networks
Harvard University via edX