Search Wisely: Mitigating Sub-optimal Agentic Searches By Reducing Uncertainty

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

Agentic RAG systems frequently suffer from inefficient retrieval—either over-retrieval or under-retrieval—due to models’ uncertainty about their own knowledge boundaries, compromising both efficiency and reliability. This work formally defines and quantifies such inefficiency in agent-driven search, uncovering an intrinsic trade-off between search uncertainty and response accuracy. To address this, we propose β-GRPO, a confidence-gated reinforcement learning training paradigm that dynamically adjusts multi-step retrieval decisions via learnable, adaptive confidence thresholds. By jointly modeling epistemic uncertainty and optimizing RAG behavior, β-GRPO bridges uncertainty-aware reasoning with retrieval efficiency. Evaluated across seven QA benchmarks, it improves average exact match scores by 4% for 3B-parameter models and reduces redundant retrieval steps by up to 27.7%, effectively balancing retrieval precision and computational efficiency.

Technology Category

Application Category

📝 Abstract

Agentic Retrieval-Augmented Generation (RAG) systems enhance Large Language Models (LLMs) by enabling dynamic, multi-step reasoning and information retrieval. However, these systems often exhibit sub-optimal search behaviors like over-search (retrieving redundant information) and under-search (failing to retrieve necessary information), which hinder efficiency and reliability. This work formally defines and quantifies these behaviors, revealing their prevalence across multiple QA datasets and agentic RAG systems (e.g., one model could have avoided searching in 27.7% of its search steps). Furthermore, we demonstrate a crucial link between these inefficiencies and the models' uncertainty regarding their own knowledge boundaries, where response accuracy correlates with model's uncertainty in its search decisions. To address this, we propose $eta$-GRPO, a reinforcement learning-based training method that incorporates confidence threshold to reward high-certainty search decisions. Experiments on seven QA benchmarks show that $eta$-GRPO enable a 3B model with better agentic RAG ability, outperforming other strong baselines with a 4% higher average exact match score.

Problem

Research questions and friction points this paper is trying to address.

Addresses sub-optimal search behaviors in agentic RAG systems

Links search inefficiencies to model uncertainty in knowledge boundaries

Proposes a reinforcement learning method to improve search decisions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning for search optimization

Confidence threshold in search decisions

Reducing uncertainty in agentic RAG

🔎 Similar Papers

Perception Matters: Enhancing Embodied AI with Uncertainty-Aware Semantic Segmentation