Open Problems in Mechanistic Interpretability: A Whirlwind Tour

Offered By: Google TechTalks via YouTube

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Embark on a comprehensive exploration of Mechanistic Interpretability in this 55-minute Google TechTalk presented by Neel Nanda. Delve into the fascinating field of reverse engineering learned algorithms in trained neural networks, with the goal of enhancing the safety and steerability of powerful systems. Gain insights into key works, promising areas of future research, and open problems in the field. Explore techniques in causal abstraction and meditation analysis, understand superposition and distributed representations, learn about model editing, and study individual circuits and neurons. Benefit from Neel's expertise as a member of the mechanistic interpretability team at Google DeepMind, drawing from his previous work with Chris Olah at Anthropic on the transformer circuits agenda and his independent research on reverse-engineering modular addition and understanding grokking.

Syllabus

Open Problems in Mechanistic Interpretability: A Whirlwind Tour

Taught by

Google TechTalks

Open Problems in Mechanistic Interpretability: A Whirlwind Tour

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Open Problems in Mechanistic Interpretability: A Whirlwind Tour

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Login to Continue