Contrastive Retrieval Heads Improve Attention-Based Re-Ranking

📅 2025-10-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Attention-based re-rankers suffer from noise and redundancy due to the excessive number of Transformer attention heads, limiting retrieval effectiveness. Method: We propose CoRe (Contrastive Re-ranking), a parameter-free framework that employs contrastive learning to quantify the discriminative attention of each head toward relevant documents, enabling dynamic selection of high-value heads. Furthermore, we introduce a relative ranking criterion to identify the optimal head distribution—empirically concentrated in middle layers—and prune the last 50% of layers to accelerate inference. Contribution/Results: CoRe requires no fine-tuning, supports zero-shot and long-context large language models (LLMs), and operates at the list level. Evaluated on three mainstream LLMs, it achieves state-of-the-art re-ranking performance using fewer than 1% of attention heads—significantly outperforming strong baselines while drastically reducing computational overhead.

Technology Category

Application Category

📝 Abstract
The strong zero-shot and long-context capabilities of recent Large Language Models (LLMs) have paved the way for highly effective re-ranking systems. Attention-based re-rankers leverage attention weights from transformer heads to produce relevance scores, but not all heads are created equally: many contribute noise and redundancy, thus limiting performance. To address this, we introduce CoRe heads, a small set of retrieval heads identified via a contrastive scoring metric that explicitly rewards high attention heads that correlate with relevant documents, while downplaying nodes with higher attention that correlate with irrelevant documents. This relative ranking criterion isolates the most discriminative heads for re-ranking and yields a state-of-the-art list-wise re-ranker. Extensive experiments with three LLMs show that aggregated signals from CoRe heads, constituting less than 1% of all heads, substantially improve re-ranking accuracy over strong baselines. We further find that CoRe heads are concentrated in middle layers, and pruning the computation of final 50% of model layers preserves accuracy while significantly reducing inference time and memory usage.
Problem

Research questions and friction points this paper is trying to address.

Identifying optimal transformer heads for document re-ranking
Reducing noise from irrelevant attention heads in retrieval
Improving re-ranking accuracy while maintaining computational efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive scoring identifies optimal retrieval attention heads
CoRe heads constitute under 1% of total model heads
Pruning final layers maintains accuracy while reducing computation
🔎 Similar Papers
No similar papers found.
L
Linh Tran
Rensselaer Polytechnic Institute
Y
Yulong Li
IBM Research
Radu Florian
Radu Florian
Research Staff Member, IBM
Natural Language ProcessingMachine Learning
W
Wei Sun
IBM Research