🤖 AI Summary
To address the robustness issue in LLMs augmented with external search—where minor query perturbations trigger reasoning failure and error accumulation—this paper proposes a goal-oriented planning and self-reflection framework. First, it explicitly models and dynamically maintains a search objective; second, it employs an iterative result introspection mechanism to verify evidence consistency and refine reasoning trajectories. The search process is decoupled into three distinct phases: goal formulation, execution, and meta-cognitive evaluation. This design significantly enhances resilience against noisy queries and misleading retrieval results. Experiments across multiple knowledge-intensive QA benchmarks demonstrate that our method achieves state-of-the-art (SOTA) search accuracy under standard conditions and maintains over 92% accuracy under adversarial query perturbations—a gain of more than 30 percentage points over baselines—thereby substantially mitigating search fragility.
📝 Abstract
Large language models (LLMs) excel at knowledge-intensive question answering and reasoning, yet their real-world deployment remains constrained by knowledge cutoff, hallucination, and limited interaction modalities. Augmenting LLMs with external search tools helps alleviate these issues, but it also exposes agents to a complex search environment in which small, plausible variations in query formulation can steer reasoning into unproductive trajectories and amplify errors. We present a systematic analysis that quantifies how environmental complexity induces fragile search behaviors and, in turn, degrades overall performance. To address this challenge, we propose a simple yet effective approach to instantiate a search agent, RE-Searcher. During search, RE-Searcher explicitly articulates a concrete search goal and subsequently reflects on whether the retrieved evidence satisfies that goal. This combination of goal-oriented planning and self-reflection enables RE-Searcher to resist spurious cues in complex search environments and perform robust search. Extensive experiments show that our method improves search accuracy and achieves state-of-the-art results. Perturbation studies further demonstrate substantial resilience to noisy or misleading external signals, mitigating the fragility of the search process. We believe these findings offer practical guidance for integrating LLM-powered agents into more complex interactive environments and enabling more autonomous decision-making.