🤖 AI Summary
Pointwise methods for LLM-based text ranking suffer from high bias, while pairwise approaches incur prohibitive computational overhead (O(n²)). To resolve this trade-off, this paper proposes RefRank—a zero-shot prompt-based reference-guided ranking framework. Its core innovation is the introduction of a fixed, semantically anchored reference document, enabling indirect pairwise comparisons between each candidate and the reference—thereby achieving linear time complexity (O(n)). Furthermore, RefRank incorporates a shared-reference mechanism and a multi-reference weighted aggregation strategy to enhance robustness and generalization. Extensive experiments across multiple benchmark datasets and diverse LLMs demonstrate that RefRank significantly outperforms pointwise baselines and matches the ranking accuracy of pairwise methods, while reducing inference cost by an order of magnitude.
📝 Abstract
Large Language Models (LLMs) have demonstrated exceptional performance in the task of text ranking for information retrieval. While Pointwise ranking approaches offer computational efficiency by scoring documents independently, they often yield biased relevance estimates due to the lack of inter-document comparisons. In contrast, Pairwise methods improve ranking accuracy by explicitly comparing document pairs, but suffer from substantial computational overhead with quadratic complexity ($O(n^2)$). To address this tradeoff, we propose extbf{RefRank}, a simple and effective comparative ranking method based on a fixed reference document. Instead of comparing all document pairs, RefRank prompts the LLM to evaluate each candidate relative to a shared reference anchor. By selecting the reference anchor that encapsulates the core query intent, RefRank implicitly captures relevance cues, enabling indirect comparison between documents via this common anchor. This reduces computational cost to linear time ($O(n)$) while importantly, preserving the advantages of comparative evaluation. To further enhance robustness, we aggregate multiple RefRank outputs using a weighted averaging scheme across different reference choices. Experiments on several benchmark datasets and with various LLMs show that RefRank significantly outperforms Pointwise baselines and could achieve performance at least on par with Pairwise approaches with a significantly lower computational cost.