YoVDO

On the Curses of Future and History in Off-policy Evaluation in Non-Markov Environments

Offered By: Simons Institute via YouTube

Tags

Reinforcement Learning Courses Sample Complexity Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a lecture on off-policy evaluation in non-Markov environments, focusing on the challenges of coverage in partially observable Markov decision processes (POMDPs). Delve into the novel framework of future-dependent value functions and learn about belief coverage and outcome coverage assumptions tailored to POMDP structures. Discover how these concepts enable the first polynomial sample complexity guarantee for off-policy evaluation in POMDPs, addressing the limitations of traditional Markov-based approaches. Gain insights into the practical implications for real-world applications of reinforcement learning, including RLHF in large language models.

Syllabus

On the Curses of Future and History in Off-policy Evaluation in non-Markov Environments


Taught by

Simons Institute

Related Courses

Computational Neuroscience
University of Washington via Coursera
Reinforcement Learning
Brown University via Udacity
Reinforcement Learning
Indian Institute of Technology Madras via Swayam
FA17: Machine Learning
Georgia Institute of Technology via edX
Introduction to Reinforcement Learning
Higher School of Economics via Coursera