Optimizing Agentic Reasoning with Retrieval via Synthetic Semantic Information Gain Reward

📅 2026-01-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of dense and principled reward signals in existing retrieval-augmented reasoning methods for optimizing information acquisition. The authors propose InfoReasoner, a framework that formalizes information gain as the reduction in uncertainty of belief states and introduces an output-aware intrinsic reward estimator that requires no human annotations. The approach integrates semantic clustering based on bidirectional textual entailment, models rewards via information gain, and employs Group Relative Policy Optimization (GRPO) for policy learning. Evaluated across seven question-answering benchmarks, InfoReasoner achieves an average accuracy improvement of 5.4% over current retrieval-augmented baselines, demonstrating its effectiveness in guiding models to retrieve and reason more efficiently.

Technology Category

Application Category

📝 Abstract
Agentic reasoning enables large reasoning models (LRMs) to dynamically acquire external knowledge, but yet optimizing the retrieval process remains challenging due to the lack of dense, principled reward signals. In this paper, we introduce InfoReasoner, a unified framework that incentivizes effective information seeking via a synthetic semantic information gain reward. Theoretically, we redefine information gain as uncertainty reduction over the model's belief states, establishing guarantees, including non-negativity, telescoping additivity, and channel monotonicity. Practically, to enable scalable optimization without manual retrieval annotations, we propose an output-aware intrinsic estimator that computes information gain directly from the model's output distributions using semantic clustering via bidirectional textual entailment. This intrinsic reward guides the policy to maximize epistemic progress, enabling efficient training via Group Relative Policy Optimization (GRPO). Experiments across seven question-answering benchmarks demonstrate that InfoReasoner consistently outperforms strong retrieval-augmented baselines, achieving up to 5.4% average accuracy improvement. Our work provides a theoretically grounded and scalable path toward agentic reasoning with retrieval. The code is available at https://github.com/dl-m9/InfoReasoner
Problem

Research questions and friction points this paper is trying to address.

agentic reasoning
retrieval optimization
information gain
reward signal
large reasoning models
Innovation

Methods, ideas, or system contributions that make the work stand out.

information gain
agentic reasoning
retrieval-augmented generation
intrinsic reward
semantic clustering
🔎 Similar Papers
No similar papers found.
S
Senkang Hu
Hong Kong JC STEM Lab of Smart City, City University of Hong Kong
Y
Yong Dai
Fudan University
Yuzhi Zhao
Yuzhi Zhao
Ph.D., City University of Hong Kong; B.Eng., Huazhong University of Science and Technology
Low-level VisionComputational PhotographyLLMMLLM
Yihang Tao
Yihang Tao
City University of Hong Kong
Collaborative PerceptionAutonomous DrivingWorld Model
Yu Guo
Yu Guo
City University of Hong Kong
Computer VisionGenerative Models
Z
Zhengru Fang
Hong Kong JC STEM Lab of Smart City, City University of Hong Kong
S
Sam Tak Wu Kwong
Lingnan University
Y
Yuguang Fang
Hong Kong JC STEM Lab of Smart City, City University of Hong Kong