YoVDO

Online Learning and Bandits - Part 2

Offered By: Simons Institute via YouTube

Tags

Reinforcement Learning Courses Online Learning Courses Thompson Sampling Courses

Course Description

Overview

Delve into the intricacies of online learning and bandit algorithms in this comprehensive lecture from the Theory of Reinforcement Learning Boot Camp. Explore fundamental concepts such as the basic bandit game, regret analysis, and adversarial protocols. Learn about key algorithm design principles, including exponential weights, optimism in the face of uncertainty, and probability matching. Examine popular algorithms like Exp3, UCB, and Thompson Sampling, along with their analyses and upper bounds. Investigate advanced topics such as best of both worlds scenarios, successive elimination, and linear contextual bandits. Gain insights from experts Alan Malek of DeepMind and Wouter Koolen from Centrum Wiskunde & Informatica as they guide you through this essential area of reinforcement learning theory.

Syllabus

Intro
The Basic Bandit Game
Bandits are Super Simple MDP
The Regret
Adversarial Protocol
Algorithm Design Principle: Exponential Weights
Exp3: Abridged Analysis
Exp3: Analysis
Upgrades
Warm-up: Explore-Then-Commit
Algorithm Design Principle: OFU
UCB Illustration
UCB: Analysis
Algorithm Design Principle: Probability Matching
Thompson Sampling: Overview
Thompson Sampling: Upper Bound
Thompson Sampling: Proof Outline
Best of Both Worlds
Two Settings
Algorithm Design Principle: Action Elimination
Successive Elimination Analysis
Bonus: Linear Contextual Bandits
Algorithm Design Principle: Optimism
Review


Taught by

Simons Institute

Related Courses

E-learning and Digital Cultures
University of Edinburgh via Coursera
Construcción de un Curso Virtual en la Plataforma Moodle
Universidad de San Martín de Porres via Miríadax
Teaching Computing: Part 2
University of East Anglia via FutureLearn
Learning Design
University of Leicester via EMMA
Nuevos escenarios de aprendizaje digital
University of the Basque Country via Miríadax