Beyond Trajectory Rewards: Step-level Credit Assignment for Agentic Search via Graph Modeling

📅 2026-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of accurately evaluating the contribution of individual steps in information-seeking tasks, where trajectory-level rewards are often insufficient and step-level approaches typically rely on computationally expensive tree sampling. The authors propose modeling world knowledge as an implicit entity-relation graph and framing the search process as a traversal toward the answer node within this graph. Based on this formulation, they introduce an efficient credit assignment mechanism that eliminates the need for tree sampling. The core innovations include a Graph Distance-based Contribution Reward (GDCR) and a Step Advantage Policy Optimization (SAPO) algorithm. Experimental results demonstrate that the proposed method significantly outperforms existing approaches across four challenging information-seeking benchmarks, confirming its effectiveness and generalizability.
📝 Abstract
In Agentic Search, trajectory-level outcome rewards fail to quantify the behavioral contributions of individual steps, while existing step-level reward methods typically rely on costly tree sampling. We view world knowledge as a latent world graph and each IS task as search within a latent task graph, where effective steps should make graph progress toward the answer node. Based on this prior, we propose Graph-Distance Contribution Reward (GDCR), a step-level process reward that scores newly-retrieved and newly-cited entities by their distance to the answer node in a training-time Entity-Relation (ER) graph. We further propose Step Advantage Policy Optimization (SAPO), which converts GDCR into step-level advantages and combines them with trajectory-level outcome advantages. Experiments on four challenging benchmarks validate the effectiveness of our method.
Problem

Research questions and friction points this paper is trying to address.

Agentic Search
step-level credit assignment
trajectory rewards
reward modeling
graph-based reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-Distance Contribution Reward
Step-level Credit Assignment
Agentic Search
Latent World Graph
Step Advantage Policy Optimization
🔎 Similar Papers
No similar papers found.