YoVDO

Forecasting and Aligning AI - Jacob Steinhardt

Offered By: Stanford University via YouTube

Tags

Artificial Intelligence Courses Machine Learning Courses

Course Description

Overview

Modern ML systems sometimes undergo qualitative shifts in behavior simply by “scaling up” the number of parameters and training examples. Given this, how can we extrapolate the behavior of future ML systems and ensure that they behave safely and are aligned with humans? I’ll argue that we can often study (potential) capabilities of future ML systems through well-controlled experiments run on current systems, and use this as a laboratory for designing alignment techniques. I’ll also discuss some recent work on “medium-term” AI forecasting.


Syllabus

Introduction.
Rest of Talk.
Reward Hacking: Motivation.
Reward Hacking Example.
Reward Hacking: Example.
Summary of Full Results.
Reward Hacking: Summary.
Making NLP Models Truthful.
Contrastive Representation Clustering.
Results on Unified QA.
Caveat: True Answers Work Too.
Forecasting: Motivation.
Forecasting Competition.
Forecasting Questions.
Summary of Benchmark Forecasts.
Results So Far.
Forecasting: Lessons Learned.
Forecasting Class.


Taught by

Stanford Online

Tags

Related Courses

Introduction to Artificial Intelligence
Stanford University via Udacity
Natural Language Processing
Columbia University via Coursera
Probabilistic Graphical Models 1: Representation
Stanford University via Coursera
Computer Vision: The Fundamentals
University of California, Berkeley via Coursera
Learning from Data (Introductory Machine Learning course)
California Institute of Technology via Independent