🤖 AI Summary
This work addresses the challenge of cross-document reasoning in complex multi-hop question answering, where existing retrieval-augmented generation (RAG) approaches often fall short. To overcome this limitation, the authors propose a consensus-driven multi-view RAG framework that introduces a consensus mechanism into multi-hop RAG for the first time. The framework jointly optimizes query decomposition and corpus structuring, integrating three complementary evidence signals—relational, entity-based, and textual—for unified retrieval. By systematically combining multi-view retrieval, knowledge graph augmentation, and large language model reasoning, the method achieves substantial performance gains over prior approaches, outperforming standard RAG by an average of 26.9% across HotpotQA, 2WikiMultihopQA, and MuSiQue benchmarks, and setting a new state-of-the-art on MuSiQue with the Gemma-4-31B model.
📝 Abstract
Retrieval-augmented generation (RAG) has emerged as a promising paradigm for enhancing large language models (LLMs) on multi-hop question answering (QA), which requires reasoning over evidence from multiple documents. Current multi-hop RAG methods generally focus on either query-side task decomposition or corpus-side knowledge graph construction. Despite their progress, these methods still struggle to achieve satisfactory performance on complex multi-hop QA tasks. To this end, we propose ConRAG, a consensus-driven multi-view RAG framework that effectively boosts LLMs on complex multi-hop QA. The core of ConRAG is to systematically optimize both the query and corpus sides and to leverage multi-view evidence (relation, entity, and text signals) for more accurate retrieval. Extensive experiments on three multi-hop QA benchmarks show that ConRAG consistently outperforms all baselines by a clear margin, e.g., up to +26.9% average performance gains over vanilla RAG, and enables Gemma-4-31B to achieve a new state-of-the-art record on the challenging MuSiQue benchmark.