Open Problems in Mechanistic Interpretability: A Whirlwind Tour
Offered By: Google TechTalks via YouTube
Course Description
Overview
Embark on a comprehensive exploration of Mechanistic Interpretability in this 55-minute Google TechTalk presented by Neel Nanda. Delve into the fascinating field of reverse engineering learned algorithms in trained neural networks, with the goal of enhancing the safety and steerability of powerful systems. Gain insights into key works, promising areas of future research, and open problems in the field. Explore techniques in causal abstraction and meditation analysis, understand superposition and distributed representations, learn about model editing, and study individual circuits and neurons. Benefit from Neel's expertise as a member of the mechanistic interpretability team at Google DeepMind, drawing from his previous work with Chris Olah at Anthropic on the transformer circuits agenda and his independent research on reverse-engineering modular addition and understanding grokking.
Syllabus
Open Problems in Mechanistic Interpretability: A Whirlwind Tour
Taught by
Google TechTalks
Related Courses
Dal Reverse engineering alla stampa 3DUniversity of Naples Federico II via Federica Rapid Manufacturing
Indian Institute of Technology Kanpur via Swayam Generative Design for Industrial Applications
Autodesk via Coursera Fundamentos de Ciberseguridad: un enfoque práctico
Inter-American Development Bank via edX Functional And Conceptual Design
Indian Institute of Technology Madras via Swayam