STARec: An Efficient Agent Framework for Recommender Systems via Autonomous Deliberate Reasoning

📅 2025-08-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing recommender systems suffer from static user modeling and passive response mechanisms; LLM-based agents inherit these limitations, resulting in superficial relevance bias, weak causal reasoning, and poor robustness under data sparsity. To address this, we propose the “Slow-Thinking Enhanced Recommender Agent” (STERA) framework—a novel dual-track cognitive architecture integrating rapid-response execution with chain-of-thought (CoT) reasoning. STERA employs anchoring-based reinforcement training for preference-aligned reward shaping and leverages structured knowledge distillation to enable dynamic policy evolution and causal inference. Evaluated on MovieLens-1M and Amazon CDs, STERA achieves significant improvements over state-of-the-art methods using only 0.4% of the training data. This demonstrates substantial advances in deep reasoning, sparse-data generalization, and robust decision-making—marking a paradigm shift from reactive to reflective recommendation.

Technology Category

Application Category

📝 Abstract
While modern recommender systems are instrumental in navigating information abundance, they remain fundamentally limited by static user modeling and reactive decision-making paradigms. Current large language model (LLM)-based agents inherit these shortcomings through their overreliance on heuristic pattern matching, yielding recommendations prone to shallow correlation bias, limited causal inference, and brittleness in sparse-data scenarios. We introduce STARec, a slow-thinking augmented agent framework that endows recommender systems with autonomous deliberative reasoning capabilities. Each user is modeled as an agent with parallel cognitions: fast response for immediate interactions and slow reasoning that performs chain-of-thought rationales. To cultivate intrinsic slow thinking, we develop anchored reinforcement training - a two-stage paradigm combining structured knowledge distillation from advanced reasoning models with preference-aligned reward shaping. This hybrid approach scaffolds agents in acquiring foundational capabilities (preference summarization, rationale generation) while enabling dynamic policy adaptation through simulated feedback loops. Experiments on MovieLens 1M and Amazon CDs benchmarks demonstrate that STARec achieves substantial performance gains compared with state-of-the-art baselines, despite using only 0.4% of the full training data.
Problem

Research questions and friction points this paper is trying to address.

Overcomes static user modeling and reactive decision limitations
Addresses shallow correlation bias and limited causal inference
Improves recommendation robustness in sparse-data scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Autonomous deliberate reasoning agent framework
Anchored reinforcement training with knowledge distillation
Hybrid fast-slow cognition for user modeling
🔎 Similar Papers
No similar papers found.
C
Chenghao Wu
Gaoling School of Artificial Intelligence, Renmin University of China
Ruiyang Ren
Ruiyang Ren
Renmin University of China
Information RetrievalNatural Language ProcessingLarge Language Models
J
Junjie Zhang
Gaoling School of Artificial Intelligence, Renmin University of China
R
Ruirui Wang
Poisson Lab, Huawei
Z
Zhongrui Ma
Poisson Lab, Huawei
Q
Qi Ye
Poisson Lab, Huawei
Wayne Xin Zhao
Wayne Xin Zhao
Professor, Renmin University of China
Recommender SystemNatural Language ProcessingLarge Language Model