AURO: Reinforcement Learning for Adaptive User Retention Optimization in Recommender Systems

📅 2023-10-06
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address environmental non-stationarity in recommender systems—arising from continuously evolving user behaviors (e.g., dynamic interaction rates and retention tendencies)—this paper proposes an adaptive reinforcement learning framework for optimizing long-term user retention. Methodologically: (1) we design a novel state abstraction module jointly trained with a performance-aligned value loss to enhance policy generalization; (2) we introduce a gatekeeping exploration mechanism based on performance-guided rejection sampling to mitigate implicit cold-start issues. The approach integrates deep reinforcement learning, state abstraction modeling, value-guided training, and hybrid online simulation–real-platform evaluation. Extensive experiments on a user retention simulator, the MovieLens dataset, and a live short-video platform demonstrate that our method consistently outperforms all baselines, achieving significant improvements in both long-term retention rate and policy robustness.
📝 Abstract
The field of Reinforcement Learning (RL) has garnered increasing attention for its ability of optimizing user retention in recommender systems. A primary obstacle in this optimization process is the environment non-stationarity stemming from the continual and complex evolution of user behavior patterns over time, such as variations in interaction rates and retention propensities. These changes pose significant challenges to existing RL algorithms for recommendations, leading to issues with dynamics and reward distribution shifts. This paper introduces a novel approach called extbf{A}daptive extbf{U}ser extbf{R}etention extbf{O}ptimization (AURO) to address this challenge. To navigate the recommendation policy in non-stationary environments, AURO introduces an state abstraction module in the policy network. The module is trained with a new value-based loss function, aligning its output with the estimated performance of the current policy. As the policy performance of RL is sensitive to environment drifts, the loss function enables the state abstraction to be reflective of environment changes and notify the recommendation policy to adapt accordingly. Additionally, the non-stationarity of the environment introduces the problem of implicit cold start, where the recommendation policy continuously interacts with users displaying novel behavior patterns. AURO encourages exploration guarded by performance-based rejection sampling to maintain a stable recommendation quality in the cost-sensitive online environment. Extensive empirical analysis are conducted in a user retention simulator, the MovieLens dataset, and a live short-video recommendation platform, demonstrating AURO's superior performance against all evaluated baseline algorithms.
Problem

Research questions and friction points this paper is trying to address.

Optimizes user retention in recommender systems
Addresses non-stationarity in user behavior patterns
Mitigates implicit cold start with novel behavior
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning for retention
State abstraction module adaptation
Performance-based rejection sampling
🔎 Similar Papers
No similar papers found.