PRISM: Agentic Retrieval with LLMs for Multi-Hop Question Answering

📅 2025-10-16

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

In multi-hop question answering, existing retrieval methods struggle to simultaneously ensure evidence accuracy and completeness. This paper proposes a large language model–based agent-oriented iterative retrieval framework. Its core innovation lies in designing three specialized agents—question decomposition, context selection, and missing-evidence completion—that jointly form a closed-loop collaborative retrieval mechanism. This mechanism enhances retrieval precision while actively filtering noise and suppressing redundancy. The framework enables structured, interpretable multi-hop information aggregation and significantly alleviates long-context dependency. Evaluated on four benchmarks—HotpotQA, 2WikiMultiHopQA, MuSiQue, and MultiHopRAG—the framework consistently outperforms strong baselines. Downstream QA models achieve higher answer accuracy using fewer retrieved passages, demonstrating both effectiveness and generalizability of the approach.

Technology Category

Application Category

📝 Abstract

Retrieval plays a central role in multi-hop question answering (QA), where answering complex questions requires gathering multiple pieces of evidence. We introduce an Agentic Retrieval System that leverages large language models (LLMs) in a structured loop to retrieve relevant evidence with high precision and recall. Our framework consists of three specialized agents: a Question Analyzer that decomposes a multi-hop question into sub-questions, a Selector that identifies the most relevant context for each sub-question (focusing on precision), and an Adder that brings in any missing evidence (focusing on recall). The iterative interaction between Selector and Adder yields a compact yet comprehensive set of supporting passages. In particular, it achieves higher retrieval accuracy while filtering out distracting content, enabling downstream QA models to surpass full-context answer accuracy while relying on significantly less irrelevant information. Experiments on four multi-hop QA benchmarks -- HotpotQA, 2WikiMultiHopQA, MuSiQue, and MultiHopRAG -- demonstrates that our approach consistently outperforms strong baselines.

Problem

Research questions and friction points this paper is trying to address.

Decomposes multi-hop questions into sub-questions for analysis

Retrieves precise evidence while filtering out distracting content

Improves accuracy in multi-hop QA using specialized agent framework

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs in structured loop for retrieval

Three specialized agents for decomposition

Iterative interaction for compact evidence set

🔎 Similar Papers

CuriousLLM: Elevating Multi-Document Question Answering with LLM-Enhanced Knowledge Graph Reasoning