YoVDO

Open Problems in Mechanistic Interpretability: A Whirlwind Tour

Offered By: Google TechTalks via YouTube

Tags

Neural Networks Courses Circuits Courses Reverse Engineering Courses Superposition Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Embark on a comprehensive exploration of Mechanistic Interpretability in this 55-minute Google TechTalk presented by Neel Nanda. Delve into the fascinating field of reverse engineering learned algorithms in trained neural networks, with the goal of enhancing the safety and steerability of powerful systems. Gain insights into key works, promising areas of future research, and open problems in the field. Explore techniques in causal abstraction and meditation analysis, understand superposition and distributed representations, learn about model editing, and study individual circuits and neurons. Benefit from Neel's expertise as a member of the mechanistic interpretability team at Google DeepMind, drawing from his previous work with Chris Olah at Anthropic on the transformer circuits agenda and his independent research on reverse-engineering modular addition and understanding grokking.

Syllabus

Open Problems in Mechanistic Interpretability: A Whirlwind Tour


Taught by

Google TechTalks

Related Courses

Dal Reverse engineering alla stampa 3D
University of Naples Federico II via Federica
Rapid Manufacturing
Indian Institute of Technology Kanpur via Swayam
Generative Design for Industrial Applications
Autodesk via Coursera
Fundamentos de Ciberseguridad: un enfoque práctico
Inter-American Development Bank via edX
Functional And Conceptual Design
Indian Institute of Technology Madras via Swayam