🤖 AI Summary
This work addresses the lack of a unified analytical framework for retrieval and reasoning pipelines in multi-hop question answering, which has hindered systematic comparison across methods. We propose, for the first time, a four-axis design framework that treats the execution process as the fundamental unit of analysis, encompassing execution plans, index structures, control strategies, and termination criteria. Through a comprehensive literature review and ablation studies, we structurally map prominent approaches—including RAG and agent-based systems—onto this framework using benchmarks such as HotpotQA, revealing consistent trade-offs among effectiveness, efficiency, and evidence faithfulness. Our framework systematically organizes existing design choices, identifies reproducible empirical trends, and highlights key challenges, including structure-aware planning and transferable control strategies.
📝 Abstract
Multi-hop question answering (QA) requires systems to iteratively retrieve evidence and reason across multiple hops. While recent RAG and agentic methods report strong results, the underlying retrieval--reasoning \emph{process} is often left implicit, making procedural choices hard to compare across model families. This survey takes the execution procedure as the unit of analysis and introduces a four-axis framework covering (A) overall execution plan, (B) index structure, (C) next-step control (strategies and triggers), and (D) stop/continue criteria. Using this schema, we map representative multi-hop QA systems and synthesize reported ablations and tendencies on standard benchmarks (e.g., HotpotQA, 2WikiMultiHopQA, MuSiQue), highlighting recurring trade-offs among effectiveness, efficiency, and evidence faithfulness. We conclude with open challenges for retrieval--reasoning agents, including structure-aware planning, transferable control policies, and robust stopping under distribution shift.