Shifting from Ranking to Set Selection for Retrieval Augmented Generation

📅 2025-07-09

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

Existing RAG systems predominantly employ single-passage re-ranking, which struggles to satisfy the holistic information coverage required for complex queries such as multi-hop question answering. This work proposes a novel “retrieval-as-set-selection” paradigm that abandons sequential passage ranking in favor of jointly optimizing the selection of multiple documents to maximize overall information completeness. Methodologically, we introduce, for the first time, a set-aware understanding mechanism integrated with chain-of-thought (CoT) reasoning to explicitly model query requirements and enable end-to-end learning of collaborative multi-document selection policies. Evaluated on multi-hop RAG benchmarks, our approach significantly outperforms both closed- and open-source re-ranking baselines, achieving state-of-the-art performance in both retrieval accuracy and final answer correctness. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Retrieval in Retrieval-Augmented Generation(RAG) must ensure that retrieved passages are not only individually relevant but also collectively form a comprehensive set. Existing approaches primarily rerank top-k passages based on their individual relevance, often failing to meet the information needs of complex queries in multi-hop question answering. In this work, we propose a set-wise passage selection approach and introduce SETR, which explicitly identifies the information requirements of a query through Chain-of-Thought reasoning and selects an optimal set of passages that collectively satisfy those requirements. Experiments on multi-hop RAG benchmarks show that SETR outperforms both proprietary LLM-based rerankers and open-source baselines in terms of answer correctness and retrieval quality, providing an effective and efficient alternative to traditional rerankers in RAG systems. The code is available at https://github.com/LGAI-Research/SetR

Problem

Research questions and friction points this paper is trying to address.

Ensuring retrieved passages collectively cover query needs

Improving multi-hop QA by optimizing set selection

Replacing individual ranking with comprehensive set retrieval

Innovation

Methods, ideas, or system contributions that make the work stand out.

Set-wise passage selection for RAG

Chain-of-Thought reasoning for query requirements

Optimal set selection for comprehensive answers

🔎 Similar Papers

No similar papers found.