Stop-RAG: Value-Based Retrieval Control for Iterative RAG

📅 2025-10-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Iterative retrieval-augmented generation (RAG) for multi-hop question answering suffers from high latency, computational cost, and noise due to redundant retrieval steps. Method: We propose Stop-RAG, an adaptive stopping mechanism grounded in value learning. It formalizes iterative RAG as a finite-horizon Markov decision process (MDP) and introduces a learnable value controller trained end-to-end using the full-trajectory Q(λ) objective to decide whether to continue retrieval. Stop-RAG requires no modifications to the underlying large language model or retriever—making it compatible with black-box APIs and existing RAG pipelines. Results: On multi-hop QA benchmarks, Stop-RAG significantly outperforms fixed-step and prompt-driven stopping strategies, reducing average retrieval rounds by 38%, latency by 41%, and cost, while improving answer accuracy by +5.2%. These results demonstrate that value-driven adaptive termination is critical for enhancing both efficiency and robustness in RAG systems.

Technology Category

Application Category

📝 Abstract
Iterative retrieval-augmented generation (RAG) enables large language models to answer complex multi-hop questions, but each additional loop increases latency, costs, and the risk of introducing distracting evidence, motivating the need for an efficient stopping strategy. Existing methods either use a predetermined number of iterations or rely on confidence proxies that poorly reflect whether more retrieval will actually help. We cast iterative RAG as a finite-horizon Markov decision process and introduce Stop-RAG, a value-based controller that adaptively decides when to stop retrieving. Trained with full-width forward-view Q($λ$) targets from complete trajectories, Stop-RAG learns effective stopping policies while remaining compatible with black-box APIs and existing pipelines. On multi-hop question-answering benchmarks, Stop-RAG consistently outperforms both fixed-iteration baselines and prompting-based stopping with LLMs. These results highlight adaptive stopping as a key missing component in current agentic systems, and demonstrate that value-based control can improve the accuracy of RAG systems.
Problem

Research questions and friction points this paper is trying to address.

Optimizing stopping strategy for iterative RAG systems
Reducing latency and costs in multi-hop question answering
Improving accuracy through adaptive retrieval control
Innovation

Methods, ideas, or system contributions that make the work stand out.

Value-based controller adaptively stops retrieval iterations
Trained with Q(λ) targets from complete trajectories
Compatible with black-box APIs and existing pipelines
🔎 Similar Papers
No similar papers found.
J
Jaewan Park
Seoul National University
S
Solbee Cho
Seoul National University
Jay-Yoon Lee
Jay-Yoon Lee
Seoul National University
Machine LearningArtificial IntelligenceKnowledge InjectionStructured prediction