BLITZRANK: Principled Zero-shot Ranking Agents with Tournament Graphs

📅 2026-02-05

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work proposes the first tournament-graph-based zero-shot reranking framework to address the limitations of existing methods, which either rely on inefficient heuristics or fail to fully exploit preference signals from multi-document comparisons. The approach models k-wise comparisons as a complete tournament graph, aggregates them into a global preference graph, and infers implicit rankings via transitive closure. It further introduces certifiable ranking conditions, an information-gain-driven greedy query scheduling strategy, and an equivalence-class compression mechanism for handling non-transitive preferences, enabling efficient hierarchical reranking. Evaluated across 14 benchmarks and 5 large language models, the method achieves Pareto-optimal performance—matching or exceeding state-of-the-art accuracy while reducing token consumption by 25–40% compared to existing approaches, and cutting token usage by up to 7× relative to pairwise methods with negligible performance loss.

Technology Category

Application Category

📝 Abstract

Selecting the top $m$ from $n$ items via expensive $k$-wise comparisons is fundamental to settings ranging from LLM-based document reranking to crowdsourced evaluation and tournament design. Existing methods either rely on heuristics that fail to fully exploit the information each comparison reveals, or are inefficient when they do. We introduce a tournament graph framework that provides a principled foundation for $k$-wise ranking. Our key observation is that each $k$-item comparison reveals a complete tournament of $\binom{k}{2}$ pairwise preferences; aggregating these into a global preference graph and computing its transitive closure yields many additional orderings without further oracle calls. We formalize when an item's rank is certifiably determined and design a greedy query schedule that maximizes information gain towards identifying the top-$m$ items. The framework also gracefully handles non-transitive preferences (cycles induced by real-world oracles) by collapsing them into equivalence classes that yield principled tiered rankings. Applied to LLM reranking across 14 benchmarks and 5 models, our method achieves Pareto dominance over existing approaches: matching or exceeding accuracy while requiring 25-40% fewer tokens than comparable methods, and $7\times$ fewer than pairwise reranking at near-identical quality.

Problem

Research questions and friction points this paper is trying to address.

zero-shot reranking

large language models

tournament graphs

preference aggregation

retrieval-augmented generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

tournament graphs

zero-shot reranking

preference aggregation