AgenticRec: End-to-End Tool-Integrated Policy Optimization for Ranking-Oriented Recommender Agents

πŸ“… 2026-03-23
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the disconnect between reasoning and ranking feedback in existing large language model–based recommender agents, which hinders fine-grained user preference modeling. To bridge this gap, the authors propose AgenticRec, a framework that integrates recommendation-specific tools into the ReAct reasoning loop, enabling end-to-end joint optimization of reasoning, tool invocation, and ranking generation. Key innovations include a list-wise unbiased Group Relative Policy Optimization (List-wise GRPO) for precise credit assignment, a Progressive Preference Refinement (PPR) mechanism to mitigate preference ambiguity, and a combination of hard negative mining with bidirectional preference alignment to enhance training efficiency. Extensive experiments demonstrate that AgenticRec significantly outperforms state-of-the-art methods across multiple benchmarks, validating the effectiveness of unifying reasoning and ranking optimization.

Technology Category

Application Category

πŸ“ Abstract
Recommender agents built on Large Language Models offer a promising paradigm for recommendation. However, existing recommender agents typically suffer from a disconnect between intermediate reasoning and final ranking feedback, and are unable to capture fine-grained preferences. To address this, we present AgenticRec, a ranking-oriented agentic recommendation framework that optimizes the entire decision-making trajectory (including intermediate reasoning, tool invocation, and final ranking list generation) under sparse implicit feedback. Our approach makes three key contributions. First, we design a suite of recommendation-specific tools integrated into a ReAct loop to support evidence-grounded reasoning. Second, we propose theoretically unbiased List-Wise Group Relative Policy Optimization (list-wise GRPO) to maximize ranking utility, ensuring accurate credit assignment for complex tool-use trajectories. Third, we introduce Progressive Preference Refinement (PPR) to resolve fine-grained preference ambiguities. By mining hard negatives from ranking violations and applying bidirectional preference alignment, PPR minimizes the convex upper bound of pairwise ranking errors. Experiments on benchmarks confirm that AgenticRec significantly outperforms baselines, validating the necessity of unifying reasoning, tool use, and ranking optimization.
Problem

Research questions and friction points this paper is trying to address.

recommender agents
ranking-oriented
fine-grained preferences
reasoning-ranking disconnect
implicit feedback
Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic Recommendation
List-wise GRPO
Progressive Preference Refinement
Tool-Integrated Reasoning
Ranking-Oriented Optimization
πŸ”Ž Similar Papers
No similar papers found.
T
Tianyi Li
Xiamen University
Z
Zixuan Wang
Xiamen University
G
Guidong Lei
Xiamen University
X
Xiaodong Li
Xiamen University
Hui Li
Hui Li
Xiamen University
Information RetrievalData MiningData Management