🤖 AI Summary
In multi-hop question answering, conventional chain-of-thought (CoT) reasoning is inherently irreversible, leading to error accumulation and undermining both robustness and interpretability. To address this, we propose the first explicitly reversible multi-agent collaborative framework: it introduces a backtrackable reasoning control flow that enables dynamic error detection, backward error localization, and on-the-fly path correction; integrates text retrieval, information aggregation, cross-validation, and knowledge enhancement to support real-time verification and iterative refinement of reasoning steps. Evaluated on three major benchmarks, our framework achieves an average improvement of approximately 6% over strong baselines. Our core contribution is the first explicit incorporation of reversibility into multi-hop reasoning—establishing a novel QA paradigm that simultaneously ensures fault tolerance and full process interpretability.
📝 Abstract
Recent advances in large language models (LLMs) have significantly improved multi-hop question answering (QA) through direct Chain-of-Thought (CoT) reasoning. However, the irreversible nature of CoT leads to error accumulation, making it challenging to correct mistakes in multi-hop reasoning. This paper introduces ReAgent: a Reversible multi-Agent collaborative framework augmented with explicit backtracking mechanisms, enabling reversible multi-hop reasoning. By incorporating text-based retrieval, information aggregation and validation, our system can detect and correct errors mid-reasoning, leading to more robust and interpretable QA outcomes. The framework and experiments serve as a foundation for future work on error-tolerant QA systems. Empirical evaluations across three benchmarks indicate ReAgent's efficacy, yielding average about 6% improvements against baseline models.