YoVDO

Towards Structural Risk Minimization for RL - Emma Brunskill

Offered By: Institute for Advanced Study via YouTube

Tags

Reinforcement Learning Courses Markov Decision Processes Courses Concentration Inequalities Courses

Course Description

Overview

Explore a comprehensive lecture on structural risk minimization in reinforcement learning delivered by Emma Brunskill from Stanford University at the Institute for Advanced Study. Delve into the importance of risk-sensitive control, distributional reinforcement learning, and the application of conditional value at risk for decision policies. Examine optimism under uncertainty techniques, concentration inequalities, and their implications for sample-efficient risk-sensitive reinforcement learning. Investigate optimistic exploration methods for both discrete and continuous state spaces, and review simulation experiments across various domains including machine replacement, HIV treatment, and blood glucose management. Gain insights into safer exploration strategies and discover potential future research directions in this cutting-edge field of artificial intelligence and machine learning.

Syllabus

Intro
Learning through Experience...
Why is Risk Sensitive Control Important?
Risk Sensitive Reinforcement Learning
Notation: Markov Decision Process Value Function
Notation: Reinforcement Learning
Background: Distributional RL for Policy Evaluation & Control
Background: Distributional Bellman Policy Evaluation Operator for Value Based Distributional RL
Maximal Form of Wasserstein Metric on 2 Distributions
Distributional Bellman Backup Operator for Control for Maximizing Expected Reward is Not a Contraction
Goal: Quickly and Efficiently use RL to Learn a Risk-Sensitive Policy using Conditional Value at Risk
Conditional Value at Risk for a Decision Policy
For Inspiration, look to Sample Efficient Learning for Policies that Optimize Expected Reward
Optimimism Under Uncertainty for Standard RL: Use Concentration Inequalities
Suggests a Path for Sample Efficient Risk Sensitive RL
Use DKW Concentration Inequality to Quantify Uncertainty over Distribution
Creating an Optimistic Estimate of Distribution of Returns
Optimism Operator Over CDF of Returns
Optimistic Operator for Policy Evaluation Yields Optimistic Estimate
Concerns about Optimistic Risk Sensitive RL
Optimistic Exploration for Risk Sensitive RL in Continuous Spaces
Recall Optimistic Operator for Distribution of Returns for Discrete State Spaces, Uses Counts
Optimistic Operator for Distribution of Returns for Continuous State Spaces, Uses Pseudo-Counts
Simulation Experiments
Baseline Algorithms
Simulation Domains
Machine Replacement, Risk level a = 0.25
HIV Treatment
Blood Glucose Simulator, Adult #5
Blood Glucose Simulator, 3 Patients
A Sidenote on Safer Exploration: Faster Learning also Reduces # of Bad Events During Learning
Many Interesting Open Directions
Optimisim for Conservatism: Fast RL for Learning Conditional Value at Risk Policies


Taught by

Institute for Advanced Study

Related Courses

Deep Learning and Python Programming for AI with Microsoft Azure
Cloudswyft via FutureLearn
Advanced Artificial Intelligence on Microsoft Azure: Deep Learning, Reinforcement Learning and Applied AI
Cloudswyft via FutureLearn
Overview of Advanced Methods of Reinforcement Learning in Finance
New York University (NYU) via Coursera
AI for Cybersecurity
Johns Hopkins University via Coursera
人工智慧:機器學習與理論基礎 (Artificial Intelligence - Learning & Theory)
National Taiwan University via Coursera