YoVDO

On the Curses of Future and History in Off-policy Evaluation in Non-Markov Environments

Offered By: Simons Institute via YouTube

Tags

Reinforcement Learning Courses Sample Complexity Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a lecture on off-policy evaluation in non-Markov environments, focusing on the challenges of coverage in partially observable Markov decision processes (POMDPs). Delve into the novel framework of future-dependent value functions and learn about belief coverage and outcome coverage assumptions tailored to POMDP structures. Discover how these concepts enable the first polynomial sample complexity guarantee for off-policy evaluation in POMDPs, addressing the limitations of traditional Markov-based approaches. Gain insights into the practical implications for real-world applications of reinforcement learning, including RLHF in large language models.

Syllabus

On the Curses of Future and History in Off-policy Evaluation in non-Markov Environments


Taught by

Simons Institute

Related Courses

Beyond Worst-Case Analysis - Panel Discussion
Simons Institute via YouTube
Reinforcement Learning - Part I
Simons Institute via YouTube
Reinforcement Learning in Feature Space: Complexity and Regret
Simons Institute via YouTube
Exploration with Limited Memory - Streaming Algorithms for Coin Tossing, Noisy Comparisons, and Multi-Armed Bandits
Association for Computing Machinery (ACM) via YouTube
Optimal Transport for Machine Learning - Gabriel Peyre, Ecole Normale Superieure
Alan Turing Institute via YouTube