Autoregressive Ranking: Bridging the Gap Between Dual and Cross Encoders

📅 2026-01-09

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This work addresses the limitations of conventional pointwise generative ranking methods, which rely on next-token prediction losses that lack explicit ranking awareness, thereby failing to fully harness the potential of large language models in information retrieval. The authors propose a novel pointwise generative ranking framework that represents documents via multi-token identifiers and leverages autoregressive generation with beam search for efficient ranking. Central to this approach is the SToICaL loss function, which injects both item-level and token-level ranking-aware supervision signals within a pointwise setting. Theoretical analysis demonstrates that multi-token document ID representations offer superior expressiveness compared to dual-encoder architectures. Experiments on WordNet and ESCI datasets confirm that the method effectively suppresses the generation of invalid documents and significantly outperforms existing baselines on Top-2 and higher ranking metrics.

Technology Category

Application Category

📝 Abstract

The success of Large Language Models (LLMs) has motivated a shift toward generative approaches to retrieval and ranking, aiming to supersede classical Dual Encoders (DEs) and Cross Encoders (CEs). A prominent paradigm is pointwise Autoregressive Ranking (ARR), where an LLM generates document identifiers (docIDs) token-by-token to enable ranking via beam search. ARR offers the promise of superior expressivity compared to DEs while avoiding the prohibitive computational cost of CEs. However, a formal theoretical foundation for this expressive power has been missing. Moreover, the standard next-token prediction loss is rank-agnostic and inappropriate for finetuning an LLM for ranking tasks. In this paper, we first prove that the expressive capacity of ARR is strictly superior to DEs. While a DE requires an embedding dimension that grows linearly with corpus size to achieve arbitrary rankings, ARR can solve it with a constant hidden dimension. We then propose SToICaL (Simple Token-Item Calibrated Loss), a generalized rank-aware training loss for LLM finetuning. By using item-level reweighting and prefix-tree marginalization, we distribute probability mass over valid docID tokens based on their ground-truth relevance. Experiments on WordNet and ESCI datasets verify that our loss suppresses invalid docID generations and significantly improves ranking metrics beyond top-1 retrieval.

Problem

Research questions and friction points this paper is trying to address.

pointwise generative ranking

rank-aware supervision

next-token prediction

information retrieval

LLM-based ranking

Innovation

Methods, ideas, or system contributions that make the work stand out.

pointwise generative ranking

SToICaL

rank-aware loss