Reinforcement Learning in Recommender Systems - Some Challenges
Offered By: Simons Institute via YouTube
Course Description
Overview
Syllabus
Intro
RL in User-Facing/Interactive Systems nature RL has found tremendous success with deep models
Some Challenges in User-facing RL (RecSys) Scale • Number of users (multi-user/MDPs) & actions combinatoriales, slates Idiosyncratic nature of actions
I. Stochastic Action Sets
SAS-MDPs: Constructing an MDP
SAS-MDPs: Solving Extended MDP
II. User-learning over Long Horizons Evidence of (very) slow user leaming and adaptation
Advantage Amplification Temporal aggregation leg, fixed actions can help amplify advantages
Advantage Amplification Temporal aggregation (eg, fixed actions) can help amplify advantages
Advantage Amplification Key points
An MDP/RL Formulation Objective: max cumulative user engagement' over session
The Problem: Item Interaction The presence of some items on the slate impacts user response hence value of others
User Choice: Assumptions Two key, but reasonable, assumptions
Full Q-Learning Decomposition still holds, standard Q-leaming update
Slate Optimization: Tractable Standard formulation: Fractional moved-integer program
Slate Optimization: Tractable Standard formulation: Fractional mixed-integer program
Synthetic Experiments Synthetic environment
Robustness to User Choice Models Change user choice model to cascade Joachims 2002
Taught by
Simons Institute
Related Courses
Discrete OptimizationUniversity of Melbourne via Coursera Solving Algorithms for Discrete Optimization
University of Melbourne via Coursera Mathematical Optimization for Business Problems
IBM via Cognitive Class Optimisation - Linear Integer Programming - Professor Raphael Hauser
Alan Turing Institute via YouTube Neural Network Verification as Piecewise Linear Optimization
Institute for Pure & Applied Mathematics (IPAM) via YouTube