Think before Recommendation: Autonomous Reasoning-enhanced Recommender

📅 2025-10-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing knowledge distillation–based recommendation methods suffer from weak teacher models, static and costly supervision signals, and superficial inference transfer. To address these limitations, we propose RecZero: the first knowledge-distillation-free, reinforcement learning–driven paradigm for autonomous reasoning–enabled recommendation. Its core innovation lies in a structured “think-then-recommend” prompting mechanism, coupled with rule-based fine-grained reward modeling and Groupwise Relative Policy Optimization (GRPO), enabling end-to-end optimization for deep user preference understanding and rating prediction. RecZero empowers large language models to autonomously acquire reasoning capabilities without labeled recommendation data. Extensive experiments on multiple benchmark datasets demonstrate significant improvements over state-of-the-art methods, validating the effectiveness and advancement of reinforcement learning in building interpretable and adaptive recommendation systems.

Technology Category

Application Category

📝 Abstract
The core task of recommender systems is to learn user preferences from historical user-item interactions. With the rapid development of large language models (LLMs), recent research has explored leveraging the reasoning capabilities of LLMs to enhance rating prediction tasks. However, existing distillation-based methods suffer from limitations such as the teacher model's insufficient recommendation capability, costly and static supervision, and superficial transfer of reasoning ability. To address these issues, this paper proposes RecZero, a reinforcement learning (RL)-based recommendation paradigm that abandons the traditional multi-model and multi-stage distillation approach. Instead, RecZero trains a single LLM through pure RL to autonomously develop reasoning capabilities for rating prediction. RecZero consists of two key components: (1) "Think-before-Recommendation" prompt construction, which employs a structured reasoning template to guide the model in step-wise analysis of user interests, item features, and user-item compatibility; and (2) rule-based reward modeling, which adopts group relative policy optimization (GRPO) to compute rewards for reasoning trajectories and optimize the LLM. Additionally, the paper explores a hybrid paradigm, RecOne, which combines supervised fine-tuning with RL, initializing the model with cold-start reasoning samples and further optimizing it with RL. Experimental results demonstrate that RecZero and RecOne significantly outperform existing baseline methods on multiple benchmark datasets, validating the superiority of the RL paradigm in achieving autonomous reasoning-enhanced recommender systems.
Problem

Research questions and friction points this paper is trying to address.

Enhance rating prediction by developing autonomous reasoning in recommender systems
Overcome limitations of distillation methods with reinforcement learning approach
Train single LLM to analyze user interests and item compatibility
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses reinforcement learning for autonomous reasoning enhancement
Implements structured prompts for step-wise user-item analysis
Adopts rule-based rewards with group policy optimization
🔎 Similar Papers
No similar papers found.
X
Xiaoyu Kong
Taobao & Tmall Group of Alibaba, China
Junguang Jiang
Junguang Jiang
Taobao & Tmall Group of Alibaba, China
B
Bin Liu
Taobao & Tmall Group of Alibaba, China
Ziru Xu
Ziru Xu
Alibaba Group
H
Han Zhu
Taobao & Tmall Group of Alibaba, China
J
Jian Xu
Taobao & Tmall Group of Alibaba, China
B
Bo Zheng
Taobao & Tmall Group of Alibaba, China
Jiancan Wu
Jiancan Wu
University of Science and Technology of China
LLMsRecommendationGraph Neural Network
X
Xiang Wang
National University of Singapore