đ¤ AI Summary
This work addresses two key bottlenecks in listwise reranking with large language models: intermediate information loss and superlinear growth of inference latency with input length. To overcome these limitations, the authors propose ResRank, a framework that unifies retrieval and reranking through end-to-end joint training. ResRank compresses candidate passages into single embedding representations and employs residual connections to align these compressed embeddings with the ranking space. Instead of autoregressive decoding, it adopts a non-autoregressive, one-step cosine similarity scoring mechanism. Evaluated on the TREC Deep Learning benchmark and eight BEIR datasets, ResRank matches or surpasses state-of-the-art reranking performance while processing only one token per passage and eliminating token generation entirely, thereby achieving substantially improved inference efficiency.
đ Abstract
Large language model (LLM) based listwise reranking has emerged as the dominant paradigm for achieving state-of-the-art ranking effectiveness in information retrieval. However, its reliance on feeding full passage texts into the LLM introduces two critical bottlenecks: the "lost in the middle" phenomenon degrades ranking quality as input length grows, and the inference latency scales super-linearly with sequence length, rendering it impractical for industrial deployment. In this paper, we present ResRank, a unified retrieval-reranking framework that fundamentally addresses both challenges. Inspired by multimodal LLMs that project visual inputs into compact token representations, ResRank employs an Encoder-LLM to compress each candidate passage into a single embedding, which is then fed alongside the query text into a Reranker-LLM for listwise ranking. To alleviate the misalignment between the compressed representation space and the ranking space, we introduce a residual connection structure that combines encoder embeddings with contextualized hidden states from the reranker. Furthermore, we replace the conventional autoregressive decoding with a one-step cosine-similarity-based scoring mechanism, eliminating the generation bottleneck entirely. ResRank is trained through a carefully designed dual-stage, multi-task, end-to-end joint optimization strategy that simultaneously trains the encoder and reranker, achieving learning objective alignment between retrieval and reranking while substantially reducing training complexity. Extensive experiments on TREC Deep Learning and eight BEIR benchmark datasets demonstrate that ResRank achieves competitive or superior ranking effectiveness compared to existing approaches while requiring zero generated tokens and processing only one token per passage, yielding a fundamentally better balance between effectiveness and efficiency.