Online Learning and Bandits - Part 2
Offered By: Simons Institute via YouTube
Course Description
Overview
Delve into the intricacies of online learning and bandit algorithms in this comprehensive lecture from the Theory of Reinforcement Learning Boot Camp. Explore fundamental concepts such as the basic bandit game, regret analysis, and adversarial protocols. Learn about key algorithm design principles, including exponential weights, optimism in the face of uncertainty, and probability matching. Examine popular algorithms like Exp3, UCB, and Thompson Sampling, along with their analyses and upper bounds. Investigate advanced topics such as best of both worlds scenarios, successive elimination, and linear contextual bandits. Gain insights from experts Alan Malek of DeepMind and Wouter Koolen from Centrum Wiskunde & Informatica as they guide you through this essential area of reinforcement learning theory.
Syllabus
Intro
The Basic Bandit Game
Bandits are Super Simple MDP
The Regret
Adversarial Protocol
Algorithm Design Principle: Exponential Weights
Exp3: Abridged Analysis
Exp3: Analysis
Upgrades
Warm-up: Explore-Then-Commit
Algorithm Design Principle: OFU
UCB Illustration
UCB: Analysis
Algorithm Design Principle: Probability Matching
Thompson Sampling: Overview
Thompson Sampling: Upper Bound
Thompson Sampling: Proof Outline
Best of Both Worlds
Two Settings
Algorithm Design Principle: Action Elimination
Successive Elimination Analysis
Bonus: Linear Contextual Bandits
Algorithm Design Principle: Optimism
Review
Taught by
Simons Institute
Related Courses
Deep Learning and Python Programming for AI with Microsoft AzureCloudswyft via FutureLearn Advanced Artificial Intelligence on Microsoft Azure: Deep Learning, Reinforcement Learning and Applied AI
Cloudswyft via FutureLearn Overview of Advanced Methods of Reinforcement Learning in Finance
New York University (NYU) via Coursera AI for Cybersecurity
Johns Hopkins University via Coursera 人工智慧:機器學習與理論基礎 (Artificial Intelligence - Learning & Theory)
National Taiwan University via Coursera