🤖 AI Summary
To address factual hallucinations in large language models (LLMs) arising from outdated or incomplete parametric knowledge, and to mitigate noise introduced by ambiguous task decomposition and redundant retrieval in conventional retrieval-augmented generation (RAG), this work formulates retrieval-augmented reasoning as a Markov decision process (MDP), enabling dynamic, stepwise retrieval decisions conditioned on the current reasoning state. The proposed method integrates reinforcement learning, iterative query decomposition, and a learnable retrieval gating mechanism to support adaptive switching between retrieval and reasoning. Evaluated on multi-hop reasoning and open-domain question answering, it achieves a 21.99% absolute improvement in answer accuracy over strong baselines. The approach significantly enhances retrieval efficiency and response quality, effectively reducing hallucinations while improving reasoning controllability and interpretability.
📝 Abstract
Large Language Models (LLMs) have shown remarkable potential in reasoning while they still suffer from severe factual hallucinations due to timeliness, accuracy, and coverage of parametric knowledge. Meanwhile, integrating reasoning with retrieval-augmented generation (RAG) remains challenging due to ineffective task decomposition and redundant retrieval, which can introduce noise and degrade response quality. In this paper, we propose DeepRAG, a framework that models retrieval-augmented reasoning as a Markov Decision Process (MDP), enabling strategic and adaptive retrieval. By iteratively decomposing queries, DeepRAG dynamically determines whether to retrieve external knowledge or rely on parametric reasoning at each step. Experiments show that DeepRAG improves retrieval efficiency while improving answer accuracy by 21.99%, demonstrating its effectiveness in optimizing retrieval-augmented reasoning.