YoVDO

Maximum Entropy Reinforcement Learning

Offered By: Pascal Poupart via YouTube

Tags

Reinforcement Learning Courses

Course Description

Overview

Explore maximum entropy reinforcement learning in this 42-minute lecture from Pascal Poupart's CS885 course at the University of Waterloo. Delve into key concepts such as encouraging stochasticity, optimal policy, Q-function, and greedy policy. Learn about soft Q-value iteration, soft Q-learning, and soft policy iteration, including policy improvement and proof derivations. Examine the Soft Actor-Critic (SAC) algorithm and its empirical results, with a focus on robustness to environment changes. Access accompanying slides on the course website for a comprehensive understanding of this advanced reinforcement learning topic.

Syllabus

Intro
Maximum Entropy RL
Reinforcement Learning
Encouraging Stochasticity
Optimal Policy
Q-function
Greedy Policy
Greedy Value function
Soft Q-Value Iteration
Soft Q-learning
Soft Policy Iteration
Policy improvement
Inequality derivation
Proof derivation
Soft Actor-Critic
Soft Actor Critic (SAC)
Empirical Results
Robustness to Environment Changes


Taught by

Pascal Poupart

Related Courses

Computational Neuroscience
University of Washington via Coursera
Reinforcement Learning
Brown University via Udacity
Reinforcement Learning
Indian Institute of Technology Madras via Swayam
FA17: Machine Learning
Georgia Institute of Technology via edX
Introduction to Reinforcement Learning
Higher School of Economics via Coursera