🤖 AI Summary
This work addresses the challenge of integrating the semantic reasoning capabilities of large language models with collaborative filtering signals to enhance recommendation accuracy while maintaining low inference latency. The authors propose STAR, a novel framework that, for the first time, internalizes multi-agent collaborative reasoning trajectories—encompassing planning, tool invocation, and self-reflection—into a single efficient recommender model via trajectory-driven distillation. To bridge user behavior with natural language reasoning, STAR introduces a collaborative signal translation mechanism that converts interaction histories into textual evidence to augment the model’s reasoning process. Extensive experiments demonstrate that STAR outperforms its multi-agent teacher by 8.7%–39.5% across multiple metrics while eliminating iterative inference delays, thereby achieving a unified solution that delivers both high accuracy and real-time responsiveness.
📝 Abstract
Large Language Models (LLMs) are reshaping recommender systems by leveraging extensive world knowledge and semantic reasoning to interpret user intent. However, effectively integrating these capabilities with collaborative signals while avoiding prohibitive inference latency remains a critical bottleneck. To address this, we propose a trajectory-driven internalization framework to develop a Single-agent Trajectory-Aligned Recommender (STAR). Specifically, to internalize complex reasoning capabilities into a single efficient model, we first design a multi-agent teacher system capable of multi-turn tool usage and reflection. This teacher utilizes a Collaborative Signal Translation mechanism to explicitly convert latent behavioral patterns into descriptive natural language evidence to enhance reasoning accuracy. Subsequently, a trajectory-driven distillation pipeline transfers this agentic logic, including planning, tool usage, and self-reflection, into the compact STAR model. Extensive experiments demonstrate that STAR surpasses its teacher by 8.7% to 39.5% while eliminating iterative latency, paving the way for real-time, reasoning-enhanced recommendation.