🤖 AI Summary
In multi-hop question answering, existing retrieval methods struggle to simultaneously ensure evidence accuracy and completeness. This paper proposes a large language model–based agent-oriented iterative retrieval framework. Its core innovation lies in designing three specialized agents—question decomposition, context selection, and missing-evidence completion—that jointly form a closed-loop collaborative retrieval mechanism. This mechanism enhances retrieval precision while actively filtering noise and suppressing redundancy. The framework enables structured, interpretable multi-hop information aggregation and significantly alleviates long-context dependency. Evaluated on four benchmarks—HotpotQA, 2WikiMultiHopQA, MuSiQue, and MultiHopRAG—the framework consistently outperforms strong baselines. Downstream QA models achieve higher answer accuracy using fewer retrieved passages, demonstrating both effectiveness and generalizability of the approach.
📝 Abstract
Retrieval plays a central role in multi-hop question answering (QA), where answering complex questions requires gathering multiple pieces of evidence. We introduce an Agentic Retrieval System that leverages large language models (LLMs) in a structured loop to retrieve relevant evidence with high precision and recall. Our framework consists of three specialized agents: a Question Analyzer that decomposes a multi-hop question into sub-questions, a Selector that identifies the most relevant context for each sub-question (focusing on precision), and an Adder that brings in any missing evidence (focusing on recall). The iterative interaction between Selector and Adder yields a compact yet comprehensive set of supporting passages. In particular, it achieves higher retrieval accuracy while filtering out distracting content, enabling downstream QA models to surpass full-context answer accuracy while relying on significantly less irrelevant information. Experiments on four multi-hop QA benchmarks -- HotpotQA, 2WikiMultiHopQA, MuSiQue, and MultiHopRAG -- demonstrates that our approach consistently outperforms strong baselines.