Trust Region & Proximal Policy Optimization
Offered By: Pascal Poupart via YouTube
Course Description
Overview
Explore trust region methods and proximal policy optimization in this 22-minute video lecture from the CS885 course at the University of Waterloo. Delve into gradient policy optimization, Kullback-Leibler Divergence, and the Trust Region Policy Optimization (TRPO) algorithm. Learn about constrained optimization and the simplified objective of Proximal Policy Optimization (PPO). Examine empirical results and illustrations to reinforce your understanding of these advanced reinforcement learning concepts. Access accompanying slides from the course website for a comprehensive learning experience.
Syllabus
Gradient policy optimization
Recall Policy Gradient
Trust region method
Trust region for policies
Kullback-Leibler Divergence
Reformulation
Derivation (continued)
Trust Region Policy Optimization (TRPO) TRPOO Initialize sa to anything Loop forever (for each episode)
Constrained Optimization
Simpler Objective
Proximal Policy Optimization (PPO)
Empirical Results
Illustration
Taught by
Pascal Poupart
Related Courses
Evaluation of Adaptive SystemsAssociation for Computing Machinery (ACM) via YouTube Topology of Surprisal - Information Theory and Vietoris-Rips Filtrations
Applied Algebraic Topology Network via YouTube The Key Equation Behind Probability - Entropy, Cross-Entropy, and KL Divergence
Artem Kirsanov via YouTube Variational Inference and Optimization - Lecture 1
Probabilistic AI School via YouTube New Directions in Quantum State Learning and Testing
QuICS via YouTube