Efficient Listwise Reranking with Compressed Document Representations

📅 2026-04-29

📈 Citations: 0

✨ Influential: 0

career value

156K/year

🤖 AI Summary

This work addresses the high computational cost of large language model (LLM)-based reranking, which often struggles to balance efficiency and effectiveness. The authors propose RRK, a novel approach that, for the first time, compresses documents into fixed-length multi-token embeddings and integrates listwise reranking with knowledge distillation to enable efficient inference. RRK achieves substantial acceleration in reranking while maintaining or even improving ranking performance. On long-document benchmarks, an 8B-parameter RRK model runs 3–18 times faster than rerankers with 0.6–4B parameters, demonstrating a superior trade-off between efficiency and effectiveness.

📝 Abstract

Reranking, the process of refining the output from a first-stage retriever, is often considered computationally expensive, especially when using Large Language Models (LLMs). A common approach to mitigate this cost involves utilizing smaller LLMs or controlling input length. Inspired by recent advances in document compression for retrieval-augmented generation (RAG), we introduce RRK, an efficient and effective listwise reranker compressing documents into multi-token fixed-size embedding representations. Our simple training via distillation shows that this combination of rich compressed representations and listwise reranking yields a highly efficient and effective system. In particular, our 8B-parameter model runs 3x-18x faster than smaller rerankers (0.6-4B parameters) while matching or outperforming them in effectiveness. The efficiency gains are even more striking on long-document benchmarks, where RRK widens its advantage further.

Problem

Research questions and friction points this paper is trying to address.

reranking

computational efficiency

large language models

document compression

retrieval-augmented generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

listwise reranking

document compression

compressed embeddings