On the Curses of Future and History in Off-policy Evaluation in Non-Markov Environments
Offered By: Simons Institute via YouTube
Course Description
Overview
Explore a lecture on off-policy evaluation in non-Markov environments, focusing on the challenges of coverage in partially observable Markov decision processes (POMDPs). Delve into the novel framework of future-dependent value functions and learn about belief coverage and outcome coverage assumptions tailored to POMDP structures. Discover how these concepts enable the first polynomial sample complexity guarantee for off-policy evaluation in POMDPs, addressing the limitations of traditional Markov-based approaches. Gain insights into the practical implications for real-world applications of reinforcement learning, including RLHF in large language models.
Syllabus
On the Curses of Future and History in Off-policy Evaluation in non-Markov Environments
Taught by
Simons Institute
Related Courses
Beyond Worst-Case Analysis - Panel DiscussionSimons Institute via YouTube Reinforcement Learning - Part I
Simons Institute via YouTube Reinforcement Learning in Feature Space: Complexity and Regret
Simons Institute via YouTube Exploration with Limited Memory - Streaming Algorithms for Coin Tossing, Noisy Comparisons, and Multi-Armed Bandits
Association for Computing Machinery (ACM) via YouTube Optimal Transport for Machine Learning - Gabriel Peyre, Ecole Normale Superieure
Alan Turing Institute via YouTube