SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search

๐Ÿ“… 2026-05-28
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Large language models often suffer from excessive search in agent-based reasoning due to a lack of self-awareness, leading to high latency and computational overhead. This work proposes a reinforcement learningโ€“based dynamic introspection mechanism that, for the first time, incorporates search boundary modeling and boundary-aware rewards, complemented by a staged curriculum optimization strategy to effectively mitigate reward hacking. The approach precisely regulates search behavior, significantly reducing inference latency and computational cost while maintaining competitive question-answering accuracy.
๐Ÿ“ Abstract
Agentic search enables LLMs to solve complex multi-hop questions through iterative reasoning and external search. Despite the effectiveness, these systems often suffer from a critical limitation in practice: agents fail to recognize their own knowledge boundaries, blindly triggering searches when internal knowledge suffices and failing to terminate search even when adequate evidence has been collected. The lack of self-awareness leads to severe \textbf{over-search}, incurring substantial inference latency and prohibitive computational cost. To this end, we propose SAAS, a novel RL framework designed to cultivate dynamic self-awareness that precisely regulates search behavior without compromising accuracy. SAAS introduces three key components: (i) a search boundary modeling mechanism, which identifies the search boundary under the evolving policy by contrasting search-disabled and search-enabled rollouts; (ii) a boundary-aware reward module, which translates this boundary awareness into trajectory-level penalties, suppressing unnecessary and redundant searches; and (iii) a stage-wise optimization strategy, which leverages a sequential curriculum to prioritize reasoning over search regularization, thereby avoiding reward hacking. Extensive experiments demonstrate that SAAS substantially reduces over-search, while maintaining accuracy. Our code is anonymously released at https://github.com/XMUDeepLIT/SAAS.
Problem

Research questions and friction points this paper is trying to address.

over-search
self-awareness
agentic search
LLMs
reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

self-awareness
over-search mitigation
agentic search
reinforcement learning
search boundary modeling
๐Ÿ”Ž Similar Papers