Optimizing Agentic Reasoning with Retrieval via Synthetic Semantic Information Gain Reward

📅 2026-01-31

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

This work addresses the lack of dense and principled reward signals in existing retrieval-augmented reasoning methods for optimizing information acquisition. The authors propose InfoReasoner, a framework that formalizes information gain as the reduction in uncertainty of belief states and introduces an output-aware intrinsic reward estimator that requires no human annotations. The approach integrates semantic clustering based on bidirectional textual entailment, models rewards via information gain, and employs Group Relative Policy Optimization (GRPO) for policy learning. Evaluated across seven question-answering benchmarks, InfoReasoner achieves an average accuracy improvement of 5.4% over current retrieval-augmented baselines, demonstrating its effectiveness in guiding models to retrieve and reason more efficiently.

Technology Category

Application Category

📝 Abstract

Agentic reasoning enables large reasoning models (LRMs) to dynamically acquire external knowledge, but yet optimizing the retrieval process remains challenging due to the lack of dense, principled reward signals. In this paper, we introduce InfoReasoner, a unified framework that incentivizes effective information seeking via a synthetic semantic information gain reward. Theoretically, we redefine information gain as uncertainty reduction over the model's belief states, establishing guarantees, including non-negativity, telescoping additivity, and channel monotonicity. Practically, to enable scalable optimization without manual retrieval annotations, we propose an output-aware intrinsic estimator that computes information gain directly from the model's output distributions using semantic clustering via bidirectional textual entailment. This intrinsic reward guides the policy to maximize epistemic progress, enabling efficient training via Group Relative Policy Optimization (GRPO). Experiments across seven question-answering benchmarks demonstrate that InfoReasoner consistently outperforms strong retrieval-augmented baselines, achieving up to 5.4% average accuracy improvement. Our work provides a theoretically grounded and scalable path toward agentic reasoning with retrieval. The code is available at https://github.com/dl-m9/InfoReasoner

Problem

Research questions and friction points this paper is trying to address.

agentic reasoning

retrieval optimization

information gain

reward signal

large reasoning models

Innovation

Methods, ideas, or system contributions that make the work stand out.

information gain

agentic reasoning

retrieval-augmented generation