🤖 AI Summary
This work addresses the limitations of conventional pointwise generative ranking methods, which rely on next-token prediction losses that lack explicit ranking awareness, thereby failing to fully harness the potential of large language models in information retrieval. The authors propose a novel pointwise generative ranking framework that represents documents via multi-token identifiers and leverages autoregressive generation with beam search for efficient ranking. Central to this approach is the SToICaL loss function, which injects both item-level and token-level ranking-aware supervision signals within a pointwise setting. Theoretical analysis demonstrates that multi-token document ID representations offer superior expressiveness compared to dual-encoder architectures. Experiments on WordNet and ESCI datasets confirm that the method effectively suppresses the generation of invalid documents and significantly outperforms existing baselines on Top-2 and higher ranking metrics.
📝 Abstract
The success of Large Language Models (LLMs) has motivated a shift toward generative approaches to retrieval and ranking, aiming to supersede classical Dual Encoders (DEs) and Cross Encoders (CEs). A prominent paradigm is pointwise Autoregressive Ranking (ARR), where an LLM generates document identifiers (docIDs) token-by-token to enable ranking via beam search. ARR offers the promise of superior expressivity compared to DEs while avoiding the prohibitive computational cost of CEs. However, a formal theoretical foundation for this expressive power has been missing. Moreover, the standard next-token prediction loss is rank-agnostic and inappropriate for finetuning an LLM for ranking tasks. In this paper, we first prove that the expressive capacity of ARR is strictly superior to DEs. While a DE requires an embedding dimension that grows linearly with corpus size to achieve arbitrary rankings, ARR can solve it with a constant hidden dimension. We then propose SToICaL (Simple Token-Item Calibrated Loss), a generalized rank-aware training loss for LLM finetuning. By using item-level reweighting and prefix-tree marginalization, we distribute probability mass over valid docID tokens based on their ground-truth relevance. Experiments on WordNet and ESCI datasets verify that our loss suppresses invalid docID generations and significantly improves ranking metrics beyond top-1 retrieval.