SumRank: Aligning Summarization Models for Long-Document Listwise Reranking

📅 2026-03-25

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This work addresses the performance and efficiency bottlenecks in listwise reranking of long documents, which arise due to excessive context length. To this end, we propose SumRank, a pointwise summarization model explicitly aligned with reranking objectives. SumRank jointly optimizes document summarization and downstream ranking through a three-stage training paradigm: supervised fine-tuning, reinforcement learning data construction, and ranking-aware reinforcement learning. This approach significantly compresses document length while preserving critical relevance signals. Experimental results demonstrate that SumRank achieves state-of-the-art performance across five TREC Deep Learning benchmarks (2019–2023) and substantially reduces computational overhead and reranking complexity.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have demonstrated superior performance in listwise passage reranking task. However, directly applying them to rank long-form documents introduces both effectiveness and efficiency issues due to the substantially increased context length. To address this challenge, we propose a pointwise summarization model SumRank, aligned with downstream listwise reranking, to compress long-form documents into concise rank-aligned summaries before the final listwise reranking stage. To obtain our summarization model SumRank, we introduce a three-stage training pipeline comprising cold-start Supervised Fine-Tuning (SFT), specialized RL data construction, and rank-driven alignment via Reinforcement Learning. This paradigm aligns the SumRank with downstream ranking objectives to preserve relevance signals. We conduct extensive experiments on five benchmark datasets from the TREC Deep Learning tracks (TREC DL 19-23). Results show that our lightweight SumRank model achieves state-of-the-art (SOTA) ranking performance while significantly improving efficiency by reducing both summarization overhead and reranking complexity.

Problem

Research questions and friction points this paper is trying to address.

long-document reranking

large language models

context length

efficiency

effectiveness

Innovation

Methods, ideas, or system contributions that make the work stand out.

SumRank

listwise reranking

long-document summarization