🤖 AI Summary
Existing RAG systems suffer from tight coupling between retrieval and generation, poor interpretability, and weak hallucination mitigation in multi-hop question answering. This paper proposes a modular and composable RAG architecture that decouples query rewriting, decomposition, retrieval decision-making, and answer verification into independent, swappable components. It introduces, for the first time, a parameterized module abstraction and a verification-first self-reflection mechanism, enabling dynamic reasoning chain correction, iterative query rewriting, and re-retrieval. This design supports isolated component upgrades and systematic attribution analysis. Evaluated on four mainstream multi-hop QA benchmarks, our approach achieves up to a 15% absolute accuracy gain, improves grounding fidelity by 3–10%, and reduces ungrounded answers by over 10%. Ablation studies confirm that module contributions are both separable and additive.
📝 Abstract
Retrieval-Augmented Generation (RAG) systems are increasingly diverse, yet many suffer from monolithic designs that tightly couple core functions like query reformulation, retrieval, reasoning, and verification. This limits their interpretability, systematic evaluation, and targeted improvement, especially for complex multi-hop question answering. We introduce ComposeRAG, a novel modular abstraction that decomposes RAG pipelines into atomic, composable modules. Each module, such as Question Decomposition, Query Rewriting, Retrieval Decision, and Answer Verification, acts as a parameterized transformation on structured inputs/outputs, allowing independent implementation, upgrade, and analysis. To enhance robustness against errors in multi-step reasoning, ComposeRAG incorporates a self-reflection mechanism that iteratively revisits and refines earlier steps upon verification failure. Evaluated on four challenging multi-hop QA benchmarks, ComposeRAG consistently outperforms strong baselines in both accuracy and grounding fidelity. Specifically, it achieves up to a 15% accuracy improvement over fine-tuning-based methods and up to a 5% gain over reasoning-specialized pipelines under identical retrieval conditions. Crucially, ComposeRAG significantly enhances grounding: its verification-first design reduces ungrounded answers by over 10% in low-quality retrieval settings, and by approximately 3% even with strong corpora. Comprehensive ablation studies validate the modular architecture, demonstrating distinct and additive contributions from each component. These findings underscore ComposeRAG's capacity to deliver flexible, transparent, scalable, and high-performing multi-hop reasoning with improved grounding and interpretability.