🤖 AI Summary
This work addresses the credit assignment challenge faced by information retrieval agents in open-web environments, where low signal-to-noise ratio feedback and sparse rewards hinder effective learning. To tackle this issue, the authors propose a vision-native search framework that represents web pages as visual snapshots to leverage layout cues, and introduce an Information-aware Credit Assignment (ICA) mechanism. ICA estimates the contribution of each snapshot to the final retrieval outcome through posterior analysis, thereby propagating dense learning signals to critical search steps. By integrating visual snapshots with information-level credit assignment within a GRPO reinforcement learning framework, the method achieves significant performance gains over text-based baselines across multiple information retrieval benchmarks, demonstrating the efficacy of vision-guided representation and dense credit assignment in long-horizon retrieval tasks.
📝 Abstract
Despite the strong performance achieved by reinforcement learning-trained information-seeking agents, learning in open-ended web environments remains severely constrained by low signal-to-noise feedback. Text-based parsers often discard layout semantics and introduce unstructured noise, while long-horizon training typically relies on sparse outcome rewards that obscure which retrieval actions actually matter. We propose a visual-native search framework that represents webpages as visual snapshots, allowing agents to leverage layout cues to quickly localize salient evidence and suppress distractors. To learn effectively from these high-dimensional observations, we introduce Information-Aware Credit Assignment (ICA), a post-hoc method that estimates each retrieved snapshot's contribution to the final outcome via posterior analysis and propagates dense learning signals back to key search turns. Integrated with a GRPO-based training pipeline, our approach consistently outperforms text-based baselines on diverse information-seeking benchmarks, providing evidence that visual snapshot grounding with information-level credit assignment alleviates the credit-assignment bottleneck in open-ended web environments. The code and datasets will be released in https://github.com/pc-inno/ICA_MM_deepsearch.git.