Better Learning from the Past - Counterfactual - Batch RL
Offered By: Simons Institute via YouTube
Course Description
Overview
Syllabus
Intro
Sequential Decision Making Under Uncertainty
Learning to Make Good Sequences of Decisions Under Uncertainty → 1980s Reinforcement Learning
Background: Markov Decision Process Value Function
Background: Reinforcement Learning
Counterfactual / Batch Off Policy Reinforcement Learning
Need for Generalization
Growing Interest in Causal Inference & ML
Batch / Counterfactual Policy Optimization: Pick Policy w/Best Estimated Expected Sum of Rewards
Quest: Batch Policy Optimization w/ Generalization Bounds
Challenge: Good Error Bound Analysis
Aim: Strong Generalization Guarantees on Policy Performance, Alternative: Guarantee Find Good in Class Policy
Off-Policy Policy Gradient with State Distribution Correction
Aim: Strong Generalization Guarantees on Policy Performance, Alternative: Guarantee Find Best in Class Policy
Example: Linear Thresholding Policies Starting HIV treatment as soon as
Use an Advantage Decomposition
Use a Doubly Robust Advantage Decomposition
Quest for Batch Policy Optimization with Generalization Guarantees
Techniques to Minimize & Understand Data Needed to Learn to Make Good Decisions
Taught by
Simons Institute
Related Courses
Toward Generalizable Embodied AI for Machine AutonomyBolei Zhou via YouTube What Are the Statistical Limits of Offline Reinforcement Learning With Function Approximation?
Simons Institute via YouTube Off-Policy Policy Optimization
Simons Institute via YouTube Provably Efficient Reinforcement Learning with Linear Function Approximation - Chi Jin
Institute for Advanced Study via YouTube Divide-and-Conquer Monte Carlo Tree Search for Goal-Directed Planning - Paper Explained
Yannic Kilcher via YouTube