Optimal Gradient-Based Algorithms for Non-Concave Bandit Optimization
Offered By: Simons Institute via YouTube
Course Description
Overview
Explore a 31-minute lecture by Qi Lei from Princeton University on optimal gradient-based algorithms for non-concave bandit optimization. Delve into advanced topics including the stochastic bandit eigenvector problem, noisy power methods, and stochastic low-rank linear rewards. Examine information theoretical understanding, regret comparisons for quadratic rewards, and higher-order polynomial bandit problems. Learn about optimal regret for different types of reward functions and potential extensions to reinforcement learning in simulator settings. Gain insights into cutting-edge research in sampling algorithms and geometries on probability distributions presented at the Simons Institute.
Syllabus
Intro
Bandit Problem
Our focus: beyond linearity and concavity
Problem li the Stochastic Bandit Eigenvector Problem
Some related work
Information theoretical understanding
Beyond cubic dimension dependence
Our methodnoisy power method
Problem i Stochastic Low-rank linear reward
Our algorithm: noisy subspace iteration
Regret comparisons: quadratic reward
Higher-order problems
Problem : Symmetric High-order Polynomial bandit
Problem IV: Asymmetric High-order Polynomial bandit
Lower bound: Optimal dependence on a
Overall Regret Comparisons
Extension to RL in simulator setting
Conclusions We find optimal regret for different types of reward function
Future directions
Taught by
Simons Institute
Related Courses
Introduction to Artificial IntelligenceStanford University via Udacity Natural Language Processing
Columbia University via Coursera Probabilistic Graphical Models 1: Representation
Stanford University via Coursera Computer Vision: The Fundamentals
University of California, Berkeley via Coursera Learning from Data (Introductory Machine Learning course)
California Institute of Technology via Independent