Reinforcement Learning in Recommender Systems - Some Challenges

Offered By: Simons Institute via YouTube


Reinforcement Learning Courses Recommender Systems Courses Mixed-Integer Programming Courses Q-learning Courses

Course Description


Explore the challenges of applying reinforcement learning to recommender systems in this 52-minute lecture by Craig Boutilier from Google and the University of Toronto. Delve into key issues such as scaling for multiple users and actions, handling stochastic action sets, and addressing user learning over long horizons. Examine the MDP/RL formulation for maximizing user engagement, and investigate item interactions on recommendation slates. Learn about user choice assumptions, Q-learning decomposition, and slate optimization techniques. Analyze synthetic experiments and the robustness of models to different user choice behaviors, including the cascade model.


RL in User-Facing/Interactive Systems nature RL has found tremendous success with deep models
Some Challenges in User-facing RL (RecSys) Scale • Number of users (multi-user/MDPs) & actions combinatoriales, slates Idiosyncratic nature of actions
I. Stochastic Action Sets
SAS-MDPs: Constructing an MDP
SAS-MDPs: Solving Extended MDP
II. User-learning over Long Horizons Evidence of (very) slow user leaming and adaptation
Advantage Amplification Temporal aggregation leg, fixed actions can help amplify advantages
Advantage Amplification Temporal aggregation (eg, fixed actions) can help amplify advantages
Advantage Amplification Key points
An MDP/RL Formulation Objective: max cumulative user engagement' over session
The Problem: Item Interaction The presence of some items on the slate impacts user response hence value of others
User Choice: Assumptions Two key, but reasonable, assumptions
Full Q-Learning Decomposition still holds, standard Q-leaming update
Slate Optimization: Tractable Standard formulation: Fractional moved-integer program
Slate Optimization: Tractable Standard formulation: Fractional mixed-integer program
Synthetic Experiments Synthetic environment
Robustness to User Choice Models Change user choice model to cascade Joachims 2002

Taught by

Simons Institute

Related Courses

Introduction to Recommender Systems
University of Minnesota via Coursera
Text Retrieval and Search Engines
University of Illinois at Urbana-Champaign via Coursera
Machine Learning: Recommender Systems & Dimensionality Reduction
University of Washington via Coursera
Java Programming: Build a Recommendation System
Duke University via Coursera
Introduction to Recommender Systems: Non-Personalized and Content-Based
University of Minnesota via Coursera