RecoMind: A Reinforcement Learning Framework for Optimizing In-Session User Satisfaction in Recommendation Systems

📅 2025-07-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of applying reinforcement learning (RL) in large-scale recommendation systems—namely, intractably large action spaces (on the order of hundreds of millions) and high engineering complexity—this paper proposes a simulator-driven lightweight RL framework. First, it constructs a high-fidelity simulation environment grounded on an existing supervised recommendation model to enable cold-start RL policy training. Second, it introduces a directed exploration strategy to mitigate sparse reward signals and combinatorial action-space explosion. Third, it employs behavior cloning for policy initialization to accelerate convergence. Crucially, the framework integrates seamlessly into industrial-grade recommendation pipelines without requiring modifications to online serving infrastructure. Extensive offline simulations and live A/B tests on a major video platform demonstrate significant improvements in long-term user engagement: a 15.81% increase in videos watched for over 10 seconds, and a 4.71% improvement in session depth with high user interaction.

Technology Category

Application Category

📝 Abstract
Existing web-scale recommendation systems commonly use supervised learning methods that prioritize immediate user feedback. Although reinforcement learning (RL) offers a solution to optimize longer-term goals, such as in-session engagement, applying it at web scale is challenging due to the extremely large action space and engineering complexity. In this paper, we introduce RecoMind, a simulator-based RL framework designed for the effective optimization of session-based goals at web-scale. RecoMind leverages existing recommendation models to establish a simulation environment and to bootstrap the RL policy to optimize immediate user interactions from the outset. This method integrates well with existing industry pipelines, simplifying the training and deployment of RL policies. Additionally, RecoMind introduces a custom exploration strategy to efficiently explore web-scale action spaces with hundreds of millions of items. We evaluated RecoMind through extensive offline simulations and online A/B testing on a video streaming platform. Both methods showed that the RL policy trained using RecoMind significantly outperforms traditional supervised learning recommendation approaches in in-session user satisfaction. In online A/B tests, the RL policy increased videos watched for more than 10 seconds by 15.81% and improved session depth by 4.71% for sessions with at least 10 interactions. As a result, RecoMind presents a systematic and scalable approach for embedding RL into web-scale recommendation systems, showing great promise for optimizing session-based user satisfaction.
Problem

Research questions and friction points this paper is trying to address.

Optimizing long-term in-session user satisfaction in recommendations
Scaling reinforcement learning for web-scale recommendation systems
Reducing engineering complexity in large action space scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Simulator-based RL framework for web-scale recommendations
Leverages existing models to bootstrap RL policy
Custom exploration strategy for large action spaces
M
Mehdi Ben Ayed
Pinterest Inc., New York, USA
Fei Feng
Fei Feng
Pinterest Inc., San Francisco, USA
J
Jay Adams
Pinterest Inc., San Francisco, USA
V
Vishwakarma Singh
Pinterest Inc., San Francisco, USA
K
Kritarth Anand
Pinterest Inc., San Francisco, USA
Jiajing Xu
Jiajing Xu
Pinterest
Recommendation systemInformation retrievalDeep learning