🤖 AI Summary
This work addresses three key challenges in zero-shot document ranking with large language models (LLMs): input length constraints, sequence-order sensitivity, and cost-performance imbalance. Methodologically, it introduces the first tournament-inspired multi-stage grouping and积分-based fusion framework for ranking—adapting sports tournament principles to information retrieval. To mitigate context-length limitations, documents are partitioned into parallel groups across multiple stages; pairwise comparative reasoning is performed within groups, followed by weighted score aggregation using tournament-style积分. This design enhances ranking robustness and consistency. Evaluated on TREC Deep Learning and BEIR benchmarks, the approach achieves state-of-the-art zero-shot performance, significantly outperforming existing baselines while reducing inference latency and API call costs. Its core contribution is the establishment of the first scalable, robust, and efficient LLM-based zero-shot ranking paradigm.
📝 Abstract
Large Language Models (LLMs) are increasingly employed in zero-shot documents ranking, yielding commendable results. However, several significant challenges still persist in LLMs for ranking: (1) LLMs are constrained by limited input length, precluding them from processing a large number of documents simultaneously; (2) The output document sequence is influenced by the input order of documents, resulting in inconsistent ranking outcomes; (3) Achieving a balance between cost and ranking performance is challenging. To tackle these issues, we introduce a novel documents ranking method called TourRank, which is inspired by the sport tournaments, such as FIFA World Cup. Specifically, we 1) overcome the limitation in input length and reduce the ranking latency by incorporating a multi-stage grouping strategy similar to the parallel group stage of sport tournaments; 2) improve the ranking performance and robustness to input orders by using a points system to ensemble multiple ranking results. We test TourRank with different LLMs on the TREC DL datasets and the BEIR benchmark. The experimental results demonstrate that TourRank delivers state-of-the-art performance at a modest cost. The code of TourRank can be seen on https://github.com/chenyiqun/TourRank.