On the Curses of Future and History in Off-policy Evaluation in Non-Markov Environments

Offered By: Simons Institute via YouTube

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Explore a lecture on off-policy evaluation in non-Markov environments, focusing on the challenges of coverage in partially observable Markov decision processes (POMDPs). Delve into the novel framework of future-dependent value functions and learn about belief coverage and outcome coverage assumptions tailored to POMDP structures. Discover how these concepts enable the first polynomial sample complexity guarantee for off-policy evaluation in POMDPs, addressing the limitations of traditional Markov-based approaches. Gain insights into the practical implications for real-world applications of reinforcement learning, including RLHF in large language models.

Syllabus

On the Curses of Future and History in Off-policy Evaluation in non-Markov Environments

Taught by

Simons Institute

Related Courses

Beyond Worst-Case Analysis - Panel Discussion
Simons Institute via YouTube Reinforcement Learning - Part I
Simons Institute via YouTube Reinforcement Learning in Feature Space: Complexity and Regret
Simons Institute via YouTube Exploration with Limited Memory - Streaming Algorithms for Coin Tossing, Noisy Comparisons, and Multi-Armed Bandits
Association for Computing Machinery (ACM) via YouTube Optimal Transport for Machine Learning - Gabriel Peyre, Ecole Normale Superieure
Alan Turing Institute via YouTube