YoVDO

Trust Region & Proximal Policy Optimization

Offered By: Pascal Poupart via YouTube

Tags

Reinforcement Learning Courses Constrained Optimization Courses Kullback-Leibler Divergence Courses

Course Description

Overview

Explore trust region methods and proximal policy optimization in this 22-minute video lecture from the CS885 course at the University of Waterloo. Delve into gradient policy optimization, Kullback-Leibler Divergence, and the Trust Region Policy Optimization (TRPO) algorithm. Learn about constrained optimization and the simplified objective of Proximal Policy Optimization (PPO). Examine empirical results and illustrations to reinforce your understanding of these advanced reinforcement learning concepts. Access accompanying slides from the course website for a comprehensive learning experience.

Syllabus

Gradient policy optimization
Recall Policy Gradient
Trust region method
Trust region for policies
Kullback-Leibler Divergence
Reformulation
Derivation (continued)
Trust Region Policy Optimization (TRPO) TRPOO Initialize sa to anything Loop forever (for each episode)
Constrained Optimization
Simpler Objective
Proximal Policy Optimization (PPO)
Empirical Results
Illustration


Taught by

Pascal Poupart

Related Courses

Constrained And Unconstrained Optimization
Indian Institute of Technology, Kharagpur via Swayam
Constrained and Unconstrained Optimization
NPTEL via YouTube
Physics of Functional Networks - Henrik Ronellenfitsch
Institute for Advanced Study via YouTube
Calculus 3 Lecture - Constrained Optimization with LaGrange Multipliers
Professor Leonard via YouTube
Constrained Optimization on Riemannian Manifolds
Simons Institute via YouTube