Trust Region Policy Optimization
Offered By: Pascal Poupart via YouTube
Course Description
Overview
Explore the Trust Region Policy Optimization (TRPO) algorithm in this 23-minute lecture presented by Shivam Kalra. Delve into reinforcement learning concepts, addressing policy gradient challenges and optimization techniques. Learn about the KL-penalized problem, the Minorization Maximization (MM) algorithm, and the Conjugate Gradient (CG) method. Gain insights into the TRPO algorithm, including its KL-constrained approach and implementation details. Enhance your understanding of advanced reinforcement learning techniques and their applications in solving complex optimization problems.
Syllabus
Intro
Reinforcement Learning
Problems of Policy Gradient
RL to Optimization
What loss to optimize?
New State Visitation is Difficult
Minorization Maximization (MM) algorithm
Solving KL-Penalized Problem
Conjugate Gradient (CG)
TRPO: KL-Constrained
TRPO Algorithm
Taught by
Pascal Poupart
Related Courses
Deep Learning and Python Programming for AI with Microsoft AzureCloudswyft via FutureLearn Advanced Artificial Intelligence on Microsoft Azure: Deep Learning, Reinforcement Learning and Applied AI
Cloudswyft via FutureLearn Overview of Advanced Methods of Reinforcement Learning in Finance
New York University (NYU) via Coursera AI for Cybersecurity
Johns Hopkins University via Coursera 人工智慧:機器學習與理論基礎 (Artificial Intelligence - Learning & Theory)
National Taiwan University via Coursera