MoR: Better Handling Diverse Queries with a Mixture of Sparse, Dense, and Human Retrievers

📅 2025-06-18

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

Single-retriever architectures in Retrieval-Augmented Generation (RAG) struggle to generalize across diverse query types. Method: This paper proposes a zero-shot hybrid retrieval framework that dynamically weights and fuses three heterogeneous retrievers—sparse (BM25), dense (neural embedding-based), and human-sourced—without manual intervention, supervised training, or model fine-tuning. It introduces the first mechanism for synergistic enhancement between human feedback signals and learned retrievers. Contribution/Results: The framework achieves truly open-domain, low-overhead adaptive retrieval. Evaluated with only a 0.8B-parameter dense retriever, it outperforms individual retrievers by 10.8% on average across multiple benchmarks and surpasses a 7B-parameter baseline by 3.9%. Incorporating the human retriever yields a relative performance gain of 58.9%, advancing lightweight, robust, and human-AI collaborative RAG retrieval.

Technology Category

Application Category

📝 Abstract

Retrieval-augmented Generation (RAG) is powerful, but its effectiveness hinges on which retrievers we use and how. Different retrievers offer distinct, often complementary signals: BM25 captures lexical matches; dense retrievers, semantic similarity. Yet in practice, we typically fix a single retriever based on heuristics, which fails to generalize across diverse information needs. Can we dynamically select and integrate multiple retrievers for each individual query, without the need for manual selection? In our work, we validate this intuition with quantitative analysis and introduce mixture of retrievers: a zero-shot, weighted combination of heterogeneous retrievers. Extensive experiments show that such mixtures are effective and efficient: Despite totaling just 0.8B parameters, this mixture outperforms every individual retriever and even larger 7B models by +10.8% and +3.9% on average, respectively. Further analysis also shows that this mixture framework can help incorporate specialized non-oracle human information sources as retrievers to achieve good collaboration, with a 58.9% relative performance improvement over simulated humans alone.

Problem

Research questions and friction points this paper is trying to address.

Dynamically select and integrate multiple retrievers per query

Improve retrieval-augmented generation with diverse retriever signals

Combine sparse, dense, and human retrievers efficiently

Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-shot weighted combination of heterogeneous retrievers

Dynamic selection of multiple retrievers per query

Incorporation of human information sources as retrievers

🔎 Similar Papers

AdaCQR: Enhancing Query Reformulation for Conversational Search via Sparse and Dense Retrieval Alignment