Beyond Trajectory Rewards: Step-level Credit Assignment for Agentic Search via Graph Modeling

📅 2026-05-28

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work addresses the challenge of accurately evaluating the contribution of individual steps in information-seeking tasks, where trajectory-level rewards are often insufficient and step-level approaches typically rely on computationally expensive tree sampling. The authors propose modeling world knowledge as an implicit entity-relation graph and framing the search process as a traversal toward the answer node within this graph. Based on this formulation, they introduce an efficient credit assignment mechanism that eliminates the need for tree sampling. The core innovations include a Graph Distance-based Contribution Reward (GDCR) and a Step Advantage Policy Optimization (SAPO) algorithm. Experimental results demonstrate that the proposed method significantly outperforms existing approaches across four challenging information-seeking benchmarks, confirming its effectiveness and generalizability.

📝 Abstract

In Agentic Search, trajectory-level outcome rewards fail to quantify the behavioral contributions of individual steps, while existing step-level reward methods typically rely on costly tree sampling. We view world knowledge as a latent world graph and each IS task as search within a latent task graph, where effective steps should make graph progress toward the answer node. Based on this prior, we propose Graph-Distance Contribution Reward (GDCR), a step-level process reward that scores newly-retrieved and newly-cited entities by their distance to the answer node in a training-time Entity-Relation (ER) graph. We further propose Step Advantage Policy Optimization (SAPO), which converts GDCR into step-level advantages and combines them with trajectory-level outcome advantages. Experiments on four challenging benchmarks validate the effectiveness of our method.

Problem

Research questions and friction points this paper is trying to address.

Agentic Search

step-level credit assignment

trajectory rewards

reward modeling

graph-based reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-Distance Contribution Reward

Step-level Credit Assignment

Agentic Search