ICA: Information-Aware Credit Assignment for Visually Grounded Long-Horizon Information-Seeking Agents

📅 2026-02-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the credit assignment challenge faced by information retrieval agents in open-web environments, where low signal-to-noise ratio feedback and sparse rewards hinder effective learning. To tackle this issue, the authors propose a vision-native search framework that represents web pages as visual snapshots to leverage layout cues, and introduce an Information-aware Credit Assignment (ICA) mechanism. ICA estimates the contribution of each snapshot to the final retrieval outcome through posterior analysis, thereby propagating dense learning signals to critical search steps. By integrating visual snapshots with information-level credit assignment within a GRPO reinforcement learning framework, the method achieves significant performance gains over text-based baselines across multiple information retrieval benchmarks, demonstrating the efficacy of vision-guided representation and dense credit assignment in long-horizon retrieval tasks.

Technology Category

Application Category

📝 Abstract
Despite the strong performance achieved by reinforcement learning-trained information-seeking agents, learning in open-ended web environments remains severely constrained by low signal-to-noise feedback. Text-based parsers often discard layout semantics and introduce unstructured noise, while long-horizon training typically relies on sparse outcome rewards that obscure which retrieval actions actually matter. We propose a visual-native search framework that represents webpages as visual snapshots, allowing agents to leverage layout cues to quickly localize salient evidence and suppress distractors. To learn effectively from these high-dimensional observations, we introduce Information-Aware Credit Assignment (ICA), a post-hoc method that estimates each retrieved snapshot's contribution to the final outcome via posterior analysis and propagates dense learning signals back to key search turns. Integrated with a GRPO-based training pipeline, our approach consistently outperforms text-based baselines on diverse information-seeking benchmarks, providing evidence that visual snapshot grounding with information-level credit assignment alleviates the credit-assignment bottleneck in open-ended web environments. The code and datasets will be released in https://github.com/pc-inno/ICA_MM_deepsearch.git.
Problem

Research questions and friction points this paper is trying to address.

credit assignment
information-seeking agents
web environments
sparse rewards
visual grounding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Information-Aware Credit Assignment
visual grounding
long-horizon reinforcement learning
web navigation
dense reward propagation
🔎 Similar Papers
No similar papers found.
Cong Pang
Cong Pang
Student, National University of Singapore
mobile networkMultimediasystemSensor SystemsGames
X
Xuyu Feng
Wuhan University; SenseTime Research
Y
Yujie Yi
Shanghai Jiao Tong University; SenseTime Research
Z
Zixuan Chen
SenseTime Research
J
Jiawei Hong
SenseTime Research
T
Tiankuo Yao
SenseTime Research
N
Nang Yuan
Shanghai Jiao Tong University; SenseTime Research
J
Jiapeng Luo
SenseTime Research
Lewei Lu
Lewei Lu
Research Director (We're Hiring, luotto@sensetime.com) @ SenseTime Research
Computer VisionDeep Learning
Xin Lou
Xin Lou
ShanghaiTech University
Circuits and Systems for Neural RenderingDigital VLSI Systems