AgenticRec: End-to-End Tool-Integrated Policy Optimization for Ranking-Oriented Recommender Agents

📅 2026-03-23

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the disconnect between reasoning and ranking feedback in existing large language model–based recommender agents, which hinders fine-grained user preference modeling. To bridge this gap, the authors propose AgenticRec, a framework that integrates recommendation-specific tools into the ReAct reasoning loop, enabling end-to-end joint optimization of reasoning, tool invocation, and ranking generation. Key innovations include a list-wise unbiased Group Relative Policy Optimization (List-wise GRPO) for precise credit assignment, a Progressive Preference Refinement (PPR) mechanism to mitigate preference ambiguity, and a combination of hard negative mining with bidirectional preference alignment to enhance training efficiency. Extensive experiments demonstrate that AgenticRec significantly outperforms state-of-the-art methods across multiple benchmarks, validating the effectiveness of unifying reasoning and ranking optimization.

Technology Category

Application Category

📝 Abstract

Recommender agents built on Large Language Models offer a promising paradigm for recommendation. However, existing recommender agents typically suffer from a disconnect between intermediate reasoning and final ranking feedback, and are unable to capture fine-grained preferences. To address this, we present AgenticRec, a ranking-oriented agentic recommendation framework that optimizes the entire decision-making trajectory (including intermediate reasoning, tool invocation, and final ranking list generation) under sparse implicit feedback. Our approach makes three key contributions. First, we design a suite of recommendation-specific tools integrated into a ReAct loop to support evidence-grounded reasoning. Second, we propose theoretically unbiased List-Wise Group Relative Policy Optimization (list-wise GRPO) to maximize ranking utility, ensuring accurate credit assignment for complex tool-use trajectories. Third, we introduce Progressive Preference Refinement (PPR) to resolve fine-grained preference ambiguities. By mining hard negatives from ranking violations and applying bidirectional preference alignment, PPR minimizes the convex upper bound of pairwise ranking errors. Experiments on benchmarks confirm that AgenticRec significantly outperforms baselines, validating the necessity of unifying reasoning, tool use, and ranking optimization.

Problem

Research questions and friction points this paper is trying to address.

recommender agents

ranking-oriented

fine-grained preferences

reasoning-ranking disconnect

implicit feedback

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic Recommendation

List-wise GRPO

Progressive Preference Refinement

Tool-Integrated Reasoning

Ranking-Oriented Optimization

🔎 Similar Papers

No similar papers found.

Authors to Follow