🤖 AI Summary
Iterative retrieval-augmented generation (RAG) for multi-hop question answering suffers from high latency, computational cost, and noise due to redundant retrieval steps. Method: We propose Stop-RAG, an adaptive stopping mechanism grounded in value learning. It formalizes iterative RAG as a finite-horizon Markov decision process (MDP) and introduces a learnable value controller trained end-to-end using the full-trajectory Q(λ) objective to decide whether to continue retrieval. Stop-RAG requires no modifications to the underlying large language model or retriever—making it compatible with black-box APIs and existing RAG pipelines. Results: On multi-hop QA benchmarks, Stop-RAG significantly outperforms fixed-step and prompt-driven stopping strategies, reducing average retrieval rounds by 38%, latency by 41%, and cost, while improving answer accuracy by +5.2%. These results demonstrate that value-driven adaptive termination is critical for enhancing both efficiency and robustness in RAG systems.
📝 Abstract
Iterative retrieval-augmented generation (RAG) enables large language models to answer complex multi-hop questions, but each additional loop increases latency, costs, and the risk of introducing distracting evidence, motivating the need for an efficient stopping strategy. Existing methods either use a predetermined number of iterations or rely on confidence proxies that poorly reflect whether more retrieval will actually help. We cast iterative RAG as a finite-horizon Markov decision process and introduce Stop-RAG, a value-based controller that adaptively decides when to stop retrieving. Trained with full-width forward-view Q($λ$) targets from complete trajectories, Stop-RAG learns effective stopping policies while remaining compatible with black-box APIs and existing pipelines. On multi-hop question-answering benchmarks, Stop-RAG consistently outperforms both fixed-iteration baselines and prompting-based stopping with LLMs. These results highlight adaptive stopping as a key missing component in current agentic systems, and demonstrate that value-based control can improve the accuracy of RAG systems.