🤖 AI Summary
This work addresses the challenge of maintaining cross-step reasoning consistency in multi-hop question answering with retrieval-augmented generation (RAG), where inaccurate query decomposition and error propagation often degrade performance. To mitigate these issues, the authors propose RT-RAG, a novel framework that structures the reasoning process by explicitly constructing a reasoning tree to decompose the original question. This decomposition clearly distinguishes core queries, known entities, and unknown entities. RT-RAG employs a bottom-up traversal strategy to iteratively rewrite queries and integrates a consensus mechanism to select high-quality evidence. By effectively curbing decomposition errors and error propagation, the method significantly enhances reasoning coherence and answer accuracy, achieving absolute improvements of 7.0% in F1 and 6.0% in exact match (EM) over existing approaches on multi-hop QA benchmarks.
📝 Abstract
Retrieval-Augmented Generation (RAG) has demonstrated significant effectiveness in enhancing large language models (LLMs) for complex multi-hop question answering (QA). For multi-hop QA tasks, current iterative approaches predominantly rely on LLMs to self-guide and plan multi-step exploration paths during retrieval, leading to substantial challenges in maintaining reasoning coherence across steps from inaccurate query decomposition and error propagation. To address these issues, we introduce Reasoning Tree Guided RAG (RT-RAG), a novel hierarchical framework for complex multi-hop QA. RT-RAG systematically decomposes multi-hop questions into explicit reasoning trees, minimizing inaccurate decomposition through structured entity analysis and consensus-based tree selection that clearly separates core queries, known entities, and unknown entities. Subsequently, a bottom-up traversal strategy employs iterative query rewriting and refinement to collect high-quality evidence, thereby mitigating error propagation. Comprehensive experiments show that RT-RAG substantially outperforms state-of-the-art methods by 7.0% F1 and 6.0% EM, demonstrating the effectiveness of RT-RAG in complex multi-hop QA.