๐ค AI Summary
Large reasoning models (LRMs) rely on parameterized knowledge, leading to insufficient factual accuracy; existing retrieval-augmented generation (RAG) methods often induce over-reasoning and exhibit poor robustness. To address this, we propose a knowledge-guided iterative RAG framework that employs reinforcement learning (RL) to dynamically decide between โsearchโ and โterminateโ actions, enabling efficient multi-hop question answering under strict constraints on reasoning chain length. Our contributions are threefold: (1) the first data construction paradigm with an explicit upper bound on reasoning length; (2) an interpretable, introspective action space supporting reasoning trajectory optimization and error localization; and (3) an integrated architecture combining LRMs, RL policies, an iterative RAG engine, and an observation-feedback regulation mechanism. Experiments demonstrate substantial improvements over baselines on multi-hop QA benchmarks, with enhanced factual consistency, robustness, and error correction capability.
๐ Abstract
Large Reasoning Models (LRMs) exhibit remarkable reasoning abilities but rely primarily on parametric knowledge, limiting factual accuracy. While recent works equip reinforcement learning (RL)-based LRMs with retrieval capabilities, they suffer from overthinking and lack robustness in reasoning, reducing their effectiveness in question answering (QA) tasks. To address this, we propose ReaRAG, a factuality-enhanced reasoning model that explores diverse queries without excessive iterations. Our solution includes a novel data construction framework with an upper bound on the reasoning chain length. Specifically, we first leverage an LRM to generate deliberate thinking, then select an action from a predefined action space (Search and Finish). For Search action, a query is executed against the RAG engine, where the result is returned as observation to guide reasoning steps later. This process iterates until a Finish action is chosen. Benefiting from ReaRAG's strong reasoning capabilities, our approach outperforms existing baselines on multi-hop QA. Further analysis highlights its strong reflective ability to recognize errors and refine its reasoning trajectory. Our study enhances LRMs' factuality while effectively integrating robust reasoning for Retrieval-Augmented Generation (RAG).