Knowing You Don't Know: Learning When to Continue Search in Multi-round RAG through Self-Practicing

📅 2025-05-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multi-round retrieval-augmented generation (RAG) suffers from accuracy degradation due to blind iterative retrieval or premature termination, while existing approaches rely on costly human annotations or exhibit limited performance. This paper proposes SIM-RAG, a novel self-introspective multi-round retrieval control framework based on self-practice: it synthesizes training data featuring introspective reasoning traces via LLM self-generation, then trains a lightweight information-sufficiency critic to enable system-level “self-doubt”; retrieval policies are further optimized via in-context reinforcement learning. Crucially, SIM-RAG requires no human-annotated process supervision, modifies neither the LLM nor the search engine, and introduces only lightweight modules—reducing training data cost by over 90%. Extensive evaluation across major RAG benchmarks demonstrates significant improvements in both answer accuracy and retrieval efficiency.

Technology Category

Application Category

📝 Abstract
Retrieval Augmented Generation (RAG) has shown strong capability in enhancing language models' knowledge and reducing AI generative hallucinations, driving its widespread use. However, complex tasks requiring multi-round retrieval remain challenging, and early attempts tend to be overly optimistic without a good sense of self-skepticism. Current multi-round RAG systems may continue searching even when enough information has already been retrieved, or they may provide incorrect answers without having sufficient information or knowledge. Existing solutions either require large amounts of expensive human-labeled process supervision data or lead to subpar performance. This paper aims to address these limitations by introducing a new framework, extbf{SIM-RAG}, to explicitly enhance RAG systems' self-awareness and multi-round retrieval capabilities. To train SIM-RAG, we first let a RAG system self-practice multi-round retrieval, augmenting existing question-answer pairs with intermediate inner monologue reasoning steps to generate synthetic training data. For each pair, the system may explore multiple retrieval paths, which are labeled as successful if they reach the correct answer and unsuccessful otherwise. Using this data, we train a lightweight information sufficiency Critic. At inference time, the Critic evaluates whether the RAG system has retrieved sufficient information at each round, guiding retrieval decisions and improving system-level self-awareness through in-context reinforcement learning. Experiments across multiple prominent RAG benchmarks show that SIM-RAG is an effective multi-round RAG solution. Furthermore, this framework is system-efficient, adding a lightweight component to RAG without requiring modifications to existing LLMs or search engines, and data-efficient, eliminating the need for costly human-annotated mid-step retrieval process supervision data.
Problem

Research questions and friction points this paper is trying to address.

Enhancing self-awareness in multi-round RAG systems
Optimizing retrieval decisions to avoid unnecessary searches
Reducing reliance on expensive human-labeled training data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-practicing RAG generates synthetic training data
Lightweight Critic evaluates information sufficiency
In-context reinforcement learning enhances self-awareness
🔎 Similar Papers
No similar papers found.