Self-Distilled Reinforcement Learning for Co-Evolving Agentic Recommender Systems

📅 2026-04-11

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This work addresses the limitations of existing agent-based recommender systems, which rely heavily on external memory, struggle to internalize interaction experiences, and overlook dense supervisory signals and mutual influences inherent in multi-turn interactions. To overcome these challenges, the authors propose CoARS, a collaborative agent recommendation framework that leverages self-distillation reinforcement learning to enable co-evolution of the recommendation agent and the user agent. By generating task-level coupled rewards and token-level credit assignment signals from interaction trajectories, CoARS transforms historical interactions into fine-grained learning supervision, thereby transcending the conventional reinforcement learning paradigm that depends solely on sparse final rewards. Experimental results demonstrate that CoARS significantly outperforms state-of-the-art baselines across multiple datasets, achieving notable improvements in both recommendation accuracy and alignment with user preferences.

Technology Category

Application Category

📝 Abstract

Large language model-empowered agentic recommender systems (ARS) reformulate recommendation as a multi-turn interaction between a recommender agent and a user agent, enabling iterative preference elicitation and refinement beyond conventional one-shot prediction. However, existing ARS are mainly optimized in a Reflexion-style paradigm, where past interaction trajectories are stored as textual memory and retrieved as prompt context for later reasoning. Although this design allows agents to recall prior feedback and observations, the accumulated experience remains external to model parameters, leaving agents reliant on generic reasoning rather than progressively acquiring recommendation-specific decision-making ability through learning. Reinforcement learning (RL) therefore provides a natural way to internalize such interaction experience into parameters. Yet existing RL methods for ARS still suffer from two key limitations. First, they fail to capture the interactive nature of ARS, in which the recommender agent and the user agent continuously influence each other and can naturally generate endogenous supervision through interaction feedback. Second, they reduce a rich multi-turn interaction process to final outcomes, overlooking the dense supervision embedded throughout the trajectory. To this end, we propose CoARS, a self-distilled reinforcement learning framework for co-evolving agentic recommender systems. CoARS introduces two complementary learning schemes: interaction reward, which derives coupled task-level supervision for the recommender agent and the user agent from the same interaction trajectory, and self-distilled credit assignment, which converts historical trajectories into token-level credit signals under teacher-student conditioning. Experiments on multiple datasets show that CoARS outperforms representative ARS baselines in recommendation performance and user alignment.

Problem

Research questions and friction points this paper is trying to address.

Agentic Recommender Systems

Reinforcement Learning

Co-Evolution

Interaction Trajectory

Self-Distillation

Innovation

Methods, ideas, or system contributions that make the work stand out.

self-distilled reinforcement learning

agentic recommender systems

co-evolving agents