Drowning in Documents: Consequences of Scaling Reranker Inference

📅 2024-11-18

🏛️ arXiv.org

📈 Citations: 6

✨ Influential: 0

career value

142K/year

🤖 AI Summary

This study identifies a performance breakpoint and semantic failure in cross-encoder re-rankers (e.g., ColBERTv2, RankT5) for large-scale document re-ranking: retrieval quality degrades significantly when the candidate set exceeds ~1,000 documents—MRR@10 drops by 12.7% on average, and 38% of top-scoring results exhibit neither lexical overlap nor semantic similarity with the query. Through systematic ablation and scaling experiments, augmented with semantic similarity and lexical matching analyses, we empirically challenge the widely held assumption that re-rankers universally outperform first-stage retrievers. Our key contributions are: (1) establishing the effective scale boundary for cross-encoder re-rankers; (2) revealing their propensity for relevance misjudgment under ultra-large candidate lists; and (3) providing theoretical grounding and practical guidance—along with critical deployment warnings—for integrating re-ranking modules into large-scale retrieval systems.

Technology Category

Application Category

📝 Abstract

Rerankers, typically cross-encoders, are often used to re-score the documents retrieved by cheaper initial IR systems. This is because, though expensive, rerankers are assumed to be more effective. We challenge this assumption by measuring reranker performance for full retrieval, not just re-scoring first-stage retrieval. Our experiments reveal a surprising trend: the best existing rerankers provide diminishing returns when scoring progressively more documents and actually degrade quality beyond a certain limit. In fact, in this setting, rerankers can frequently assign high scores to documents with no lexical or semantic overlap with the query. We hope that our findings will spur future research to improve reranking.

Problem

Research questions and friction points this paper is trying to address.

Evaluating reranker performance beyond first-stage retrieval

Assessing reranker effectiveness with modern dense embeddings

Identifying performance decline in rerankers with document scaling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Prioritize strong first-stage retrieval

Use modern dense embeddings

Test rerankers on challenging tasks

🔎 Similar Papers

No similar papers found.