LSTM-based Selective Dense Text Retrieval Guided by Sparse Lexical Retrieval

📅 2025-02-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the low efficiency and high memory overhead inherent in fusing dense and sparse retrieval. We propose CluSD, a novel two-stage clustering-based selection framework guided by sparse retrieval (e.g., BM25). It employs a lightweight LSTM model to rapidly identify semantically relevant clusters and dynamically triggers localized dense retrieval and block-level disk I/O only where needed. This enables dynamic partial dense retrieval with minimal memory overhead (<8% increase). On MS MARCO and BEIR benchmarks, CluSD achieves 2.3× faster retrieval speed over baseline methods while maintaining state-of-the-art (SOTA) performance in mAP and Recall@100. The approach thus uniquely balances retrieval accuracy, latency, and resource efficiency—advancing the practical deployment of hybrid retrieval systems.

Technology Category

Application Category

📝 Abstract
This paper studies fast fusion of dense retrieval and sparse lexical retrieval, and proposes a cluster-based selective dense retrieval method called CluSD guided by sparse lexical retrieval. CluSD takes a lightweight cluster-based approach and exploits the overlap of sparse retrieval results and embedding clusters in a two-stage selection process with an LSTM model to quickly identify relevant clusters while incurring limited extra memory space overhead. CluSD triggers partial dense retrieval and performs cluster-based block disk I/O if needed. This paper evaluates CluSD and compares it with several baselines for searching in-memory and on-disk MS MARCO and BEIR datasets.
Problem

Research questions and friction points this paper is trying to address.

Fusion of dense and sparse retrieval
Cluster-based selective dense retrieval
Efficient memory and disk I/O optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

LSTM-guided dense retrieval
Cluster-based selective retrieval
Sparse lexical retrieval fusion
🔎 Similar Papers
No similar papers found.
Y
Yingrui Yang
Coursera Inc., USA
P
Parker Carlson
University of California at Santa Barbara, USA
Yifan Qiao
Yifan Qiao
Postdoc at University of California, Berkeley
Operating SystemsCloud ComputingML Systems
W
Wentai Xie
University of California at Santa Barbara, USA
S
Shanxiu He
University of California at Santa Barbara, USA
T
Tao Yang
University of California at Santa Barbara, USA