Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking

πŸ“… 2025-06-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the inefficiency and inaccuracy of retrieval in long-context language models, this paper proposes QRHEADβ€”a query-aware key attention head identification mechanism that jointly leverages query-context attention weights and task-specific ground-truth samples to select high-value retrieval heads. Based on QRHEAD, we design QR-RETRIEVER: a lightweight, zero-shot, plug-and-play retriever requiring no fine-tuning and directly applicable to long-context reasoning and re-ranking tasks. Evaluated on LongMemEval and CLIPPER, QR-RETRIEVER outperforms full-context baselines by over 10%; on BEIR zero-shot re-ranking, it significantly surpasses LLM-based re-rankers such as RankGPT; and it demonstrates strong generalization on Needle-in-a-Haystack and multi-hop reasoning benchmarks. Our core contribution is the first formulation of attention head selection as a query-driven dynamic subset identification problem, enabling efficient, general-purpose, and training-free retrieval augmentation for long-context LMs.

Technology Category

Application Category

πŸ“ Abstract
Recent work has identified retrieval heads (Wu et al., 2025b), a subset of attention heads responsible for retrieving salient information in long-context language models (LMs), as measured by their copy-paste behavior in Needle-in-a-Haystack tasks. In this paper, we introduce QRHEAD (Query-Focused Retrieval Head), an improved set of attention heads that enhance retrieval from long context. We identify QRHEAD by aggregating attention scores with respect to the input query, using a handful of examples from real-world tasks (e.g., long-context QA). We further introduce QR- RETRIEVER, an efficient and effective retriever that uses the accumulated attention mass of QRHEAD as retrieval scores. We use QR- RETRIEVER for long-context reasoning by selecting the most relevant parts with the highest retrieval scores. On multi-hop reasoning tasks LongMemEval and CLIPPER, this yields over 10% performance gains over full context and outperforms strong dense retrievers. We also evaluate QRRETRIEVER as a re-ranker on the BEIR benchmark and find that it achieves strong zero-shot performance, outperforming other LLM-based re-rankers such as RankGPT. Further analysis shows that both the querycontext attention scoring and task selection are crucial for identifying QRHEAD with strong downstream utility. Overall, our work contributes a general-purpose retriever and offers interpretability insights into the long-context capabilities of LMs.
Problem

Research questions and friction points this paper is trying to address.

Enhance retrieval from long-context language models
Improve long-context reasoning via query-focused attention
Develop efficient retriever for relevant context selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Query-focused retrieval heads enhance long-context retrieval
Attention mass used as retrieval scores efficiently
Improves performance in reasoning and re-ranking tasks
πŸ”Ž Similar Papers
No similar papers found.
W
Wuwei Zhang
Princeton Language and Intelligence, Princton University
F
Fangcong Yin
The University of Texas at Austin
Howard Yen
Howard Yen
Princeton University
Natural language processing
Danqi Chen
Danqi Chen
Princeton University
Natural Language ProcessingMachine Learning
X
Xi Ye
Princeton Language and Intelligence, Princton University