Formalizing Explanations of Neural Network Behaviors
Offered By: Simons Institute via YouTube
Course Description
Overview
Explore a novel approach to understanding neural network behaviors in this 59-minute lecture by Paul Christiano from the Alignment Research Center. Delve into the limitations of current mechanistic interpretability research and the challenges of formal proofs for model properties. Discover an alternative strategy for explaining specific neural network behaviors that balances between informal understanding and rigorous proofs. Gain insights into a promising research direction and theoretical questions aimed at improving AI safety and interpretability. Learn how this approach, while not as comprehensive as formal proofs, may offer comparable safety benefits in the field of AI alignment.
Syllabus
Formalizing Explanations of Neural Network Behaviors
Taught by
Simons Institute
Related Courses
Automata TheoryStanford University via edX Intro to Theoretical Computer Science
Udacity Computing: Art, Magic, Science
ETH Zurich via edX 理论计算机科学基础 | Introduction to Theoretical Computer Science
Peking University via edX Quantitative Formal Modeling and Worst-Case Performance Analysis
EIT Digital via Coursera