Optimizing Compound Retrieval Systems

📅 2025-04-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of jointly optimizing multiple retrieval models—such as BM25 and large language models (LLMs)—within composite retrieval systems. We propose the first end-to-end learnable paradigm for composite retrieval, which simultaneously optimizes *where* to invoke each model (e.g., deploying LLMs for relative relevance judgment rather than conventional top-K re-ranking) and *how* to fuse their predictions, jointly maximizing both retrieval effectiveness (NDCG) and computational efficiency. Our method integrates self-supervised training, multi-stage scheduling, and a learnable weighted fusion mechanism, enabling optimization without human annotations. Evaluated on multiple benchmarks, our approach achieves a 3.2% absolute improvement in NDCG@10 over cascaded baselines under identical computational budgets—significantly surpassing the limitations of traditional cascade-based re-ranking. This establishes a new, efficient, and scalable paradigm for hybrid retrieval systems.

Technology Category

Application Category

📝 Abstract
Modern retrieval systems do not rely on a single ranking model to construct their rankings. Instead, they generally take a cascading approach where a sequence of ranking models are applied in multiple re-ranking stages. Thereby, they balance the quality of the top-K ranking with computational costs by limiting the number of documents each model re-ranks. However, the cascading approach is not the only way models can interact to form a retrieval system. We propose the concept of compound retrieval systems as a broader class of retrieval systems that apply multiple prediction models. This encapsulates cascading models but also allows other types of interactions than top-K re-ranking. In particular, we enable interactions with large language models (LLMs) which can provide relative relevance comparisons. We focus on the optimization of compound retrieval system design which uniquely involves learning where to apply the component models and how to aggregate their predictions into a final ranking. This work shows how our compound approach can combine the classic BM25 retrieval model with state-of-the-art (pairwise) LLM relevance predictions, while optimizing a given ranking metric and efficiency target. Our experimental results show optimized compound retrieval systems provide better trade-offs between effectiveness and efficiency than cascading approaches, even when applied in a self-supervised manner. With the introduction of compound retrieval systems, we hope to inspire the information retrieval field to more out-of-the-box thinking on how prediction models can interact to form rankings.
Problem

Research questions and friction points this paper is trying to address.

Optimizing retrieval systems with multiple ranking models
Combining BM25 and LLM predictions for better rankings
Improving trade-offs between effectiveness and efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines BM25 with LLM relevance predictions
Optimizes model interaction and aggregation
Balances ranking quality and computational costs
🔎 Similar Papers
No similar papers found.