FunnelRAG: A Coarse-to-Fine Progressive Retrieval Paradigm for RAG

📅 2024-10-14
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
To address the inefficiency in resource utilization and the trade-off between accuracy and latency inherent in flat, single-stage retrieval within Retrieval-Augmented Generation (RAG), this paper proposes a coarse-to-fine progressive multi-stage retrieval paradigm. We introduce a novel “granularity–quantity–capacity” tripartite coordination mechanism: initially retrieving a broad, coarse-grained candidate set with low computational capacity; then progressively refining granularity, reducing candidate quantity, and increasing model capacity per stage. This is enabled by dynamic granularity control, capacity-adaptive retriever collaboration, and latency-aware scheduling. Extensive evaluation across multiple RAG benchmarks demonstrates that our method achieves retrieval accuracy on par with state-of-the-art single-stage baselines while reducing end-to-end inference latency by nearly 40%. It thus effectively breaks the efficiency–performance trade-off bottleneck of single-stage retrieval, exhibiting strong generalizability and practical engineering applicability.

Technology Category

Application Category

📝 Abstract
Retrieval-Augmented Generation (RAG) prevails in Large Language Models. It mainly consists of retrieval and generation. The retrieval modules (a.k.a. retrievers) aim to find useful information used to facilitate generation modules (a.k.a. generators). As such, generators' performance largely depends on the effectiveness and efficiency of retrievers. However, the retrieval paradigm that we design and use remains flat, which treats the retrieval procedures as a one-off deal with constant granularity. Despite effectiveness, we argue that they suffer from two limitations: (1) flat retrieval exerts a significant burden on one retriever; (2) constant granularity limits the ceiling of retrieval performance. In this work, we propose a progressive retrieval paradigm with coarse-to-fine granularity for RAG, termed FunnelRAG, so as to balance effectiveness and efficiency. Specifically, FunnelRAG establishes a progressive retrieval pipeline by collaborating coarse-to-fine granularity, large-to-small quantity, and low-to-high capacity, which can relieve the burden on one retriever and also promote the ceiling of retrieval performance. Extensive experiments manifest that FunnelRAG achieves comparable retrieval performance while the time overhead is reduced by nearly 40 percent.
Problem

Research questions and friction points this paper is trying to address.

RAG
resource efficiency
retrieval performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

FunnelRAG
Retrieval-Augmented Generation
Efficiency Optimization
🔎 Similar Papers
No similar papers found.
X
Xinping Zhao
Harbin Institute of Technology (Shenzhen)
Y
Yan Zhong
Peking University
Z
Zetian Sun
Harbin Institute of Technology (Shenzhen)
Xinshuo Hu
Xinshuo Hu
Harbin Institute of Technology, Shenzhen
Large Language ModelText GenerationTruthfulness
Z
Zhenyu Liu
Harbin Institute of Technology (Shenzhen)
D
Dongfang Li
Harbin Institute of Technology (Shenzhen)
Baotian Hu
Baotian Hu
Harbin Institute of Technology (Shenzhen)
LLMMLLMNLP
M
Min Zhang
Harbin Institute of Technology (Shenzhen)