🤖 AI Summary
Agentic RAG systems frequently suffer from inefficient retrieval—either over-retrieval or under-retrieval—due to models’ uncertainty about their own knowledge boundaries, compromising both efficiency and reliability. This work formally defines and quantifies such inefficiency in agent-driven search, uncovering an intrinsic trade-off between search uncertainty and response accuracy. To address this, we propose β-GRPO, a confidence-gated reinforcement learning training paradigm that dynamically adjusts multi-step retrieval decisions via learnable, adaptive confidence thresholds. By jointly modeling epistemic uncertainty and optimizing RAG behavior, β-GRPO bridges uncertainty-aware reasoning with retrieval efficiency. Evaluated across seven QA benchmarks, it improves average exact match scores by 4% for 3B-parameter models and reduces redundant retrieval steps by up to 27.7%, effectively balancing retrieval precision and computational efficiency.
📝 Abstract
Agentic Retrieval-Augmented Generation (RAG) systems enhance Large Language Models (LLMs) by enabling dynamic, multi-step reasoning and information retrieval. However, these systems often exhibit sub-optimal search behaviors like over-search (retrieving redundant information) and under-search (failing to retrieve necessary information), which hinder efficiency and reliability. This work formally defines and quantifies these behaviors, revealing their prevalence across multiple QA datasets and agentic RAG systems (e.g., one model could have avoided searching in 27.7% of its search steps). Furthermore, we demonstrate a crucial link between these inefficiencies and the models' uncertainty regarding their own knowledge boundaries, where response accuracy correlates with model's uncertainty in its search decisions. To address this, we propose $eta$-GRPO, a reinforcement learning-based training method that incorporates confidence threshold to reward high-certainty search decisions. Experiments on seven QA benchmarks show that $eta$-GRPO enable a 3B model with better agentic RAG ability, outperforming other strong baselines with a 4% higher average exact match score.