Blending Learning to Rank and Dense Representations for Efficient and Effective Cascades

๐Ÿ“… 2025-10-18
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper addresses the limited modeling capacity of single dense representations in ad-hoc passage retrieval. We propose a two-stage cascaded retrieval framework that synergistically integrates lexical features with neural relevance signals. In the first stage, an efficient dense retriever performs initial candidate retrieval; in the second stage, a learning-to-rank (LTR) model based on decision tree ensembles re-ranks candidates using both MS-MARCO-pretrained dense vectors and 253-dimensional hand-crafted lexical features. The approach achieves substantial improvements in ranking qualityโ€”up to +11% in nDCG@10โ€”over pure dense retrieval baselines, while incurring only a modest 4.3% average query latency overhead. Our core contribution is a scalable, efficient, and high-accuracy hybrid-ranking paradigm, empirically demonstrating the indispensable value of explicit lexical features in neural retrieval re-ranking.

Technology Category

Application Category

๐Ÿ“ Abstract
We investigate the exploitation of both lexical and neural relevance signals for ad-hoc passage retrieval. Our exploration involves a large-scale training dataset in which dense neural representations of MS-MARCO queries and passages are complemented and integrated with 253 hand-crafted lexical features extracted from the same corpus. Blending of the relevance signals from the two different groups of features is learned by a classical Learning-to-Rank (LTR) model based on a forest of decision trees. To evaluate our solution, we employ a pipelined architecture where a dense neural retriever serves as the first stage and performs a nearest-neighbor search over the neural representations of the documents. Our LTR model acts instead as the second stage that re-ranks the set of candidates retrieved by the first stage to enhance effectiveness. The results of reproducible experiments conducted with state-of-the-art dense retrievers on publicly available resources show that the proposed solution significantly enhances the end-to-end ranking performance while relatively minimally impacting efficiency. Specifically, we achieve a boost in nDCG@10 of up to 11% with an increase in average query latency of only 4.3%. This confirms the advantage of seamlessly combining two distinct families of signals that mutually contribute to retrieval effectiveness.
Problem

Research questions and friction points this paper is trying to address.

Combining lexical and neural signals for passage retrieval
Using Learning-to-Rank to blend dense and hand-crafted features
Improving ranking effectiveness while maintaining retrieval efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Blends lexical and neural relevance signals for retrieval
Uses Learning-to-Rank model with decision trees
Employs two-stage dense retriever and reranker pipeline
๐Ÿ”Ž Similar Papers
No similar papers found.