YoVDO

Formalizing Explanations of Neural Network Behaviors

Offered By: Simons Institute via YouTube

Tags

Neural Networks Courses Theoretical Computer Science Courses Interpretability Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a novel approach to understanding neural network behaviors in this 59-minute lecture by Paul Christiano from the Alignment Research Center. Delve into the limitations of current mechanistic interpretability research and the challenges of formal proofs for model properties. Discover an alternative strategy for explaining specific neural network behaviors that balances between informal understanding and rigorous proofs. Gain insights into a promising research direction and theoretical questions aimed at improving AI safety and interpretability. Learn how this approach, while not as comprehensive as formal proofs, may offer comparable safety benefits in the field of AI alignment.

Syllabus

Formalizing Explanations of Neural Network Behaviors


Taught by

Simons Institute

Related Courses

Neural Networks for Machine Learning
University of Toronto via Coursera
Good Brain, Bad Brain: Basics
University of Birmingham via FutureLearn
Statistical Learning with R
Stanford University via edX
Machine Learning 1—Supervised Learning
Brown University via Udacity
Fundamentals of Neuroscience, Part 2: Neurons and Networks
Harvard University via edX