Blending Learning to Rank and Dense Representations for Efficient and Effective Cascades

📅 2025-10-18

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

This paper addresses the limited modeling capacity of single dense representations in ad-hoc passage retrieval. We propose a two-stage cascaded retrieval framework that synergistically integrates lexical features with neural relevance signals. In the first stage, an efficient dense retriever performs initial candidate retrieval; in the second stage, a learning-to-rank (LTR) model based on decision tree ensembles re-ranks candidates using both MS-MARCO-pretrained dense vectors and 253-dimensional hand-crafted lexical features. The approach achieves substantial improvements in ranking quality—up to +11% in nDCG@10—over pure dense retrieval baselines, while incurring only a modest 4.3% average query latency overhead. Our core contribution is a scalable, efficient, and high-accuracy hybrid-ranking paradigm, empirically demonstrating the indispensable value of explicit lexical features in neural retrieval re-ranking.

Technology Category

Application Category

📝 Abstract

We investigate the exploitation of both lexical and neural relevance signals for ad-hoc passage retrieval. Our exploration involves a large-scale training dataset in which dense neural representations of MS-MARCO queries and passages are complemented and integrated with 253 hand-crafted lexical features extracted from the same corpus. Blending of the relevance signals from the two different groups of features is learned by a classical Learning-to-Rank (LTR) model based on a forest of decision trees. To evaluate our solution, we employ a pipelined architecture where a dense neural retriever serves as the first stage and performs a nearest-neighbor search over the neural representations of the documents. Our LTR model acts instead as the second stage that re-ranks the set of candidates retrieved by the first stage to enhance effectiveness. The results of reproducible experiments conducted with state-of-the-art dense retrievers on publicly available resources show that the proposed solution significantly enhances the end-to-end ranking performance while relatively minimally impacting efficiency. Specifically, we achieve a boost in nDCG@10 of up to 11% with an increase in average query latency of only 4.3%. This confirms the advantage of seamlessly combining two distinct families of signals that mutually contribute to retrieval effectiveness.

Problem

Research questions and friction points this paper is trying to address.

Combining lexical and neural signals for passage retrieval

Using Learning-to-Rank to blend dense and hand-crafted features

Improving ranking effectiveness while maintaining retrieval efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Blends lexical and neural relevance signals for retrieval

Uses Learning-to-Rank model with decision trees

Employs two-stage dense retriever and reranker pipeline

🔎 Similar Papers

Brain Mapping with Dense Features: Grounding Cortical Semantic Selectivity in Natural Images With Vision Transformers