Focus Directions Make Your Language Models Pay More Attention to Relevant Contexts

📅 2025-03-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Long-context large language models (LLMs) suffer from distraction by irrelevant information, leading to degraded attention focus and task misalignment. To address this, we introduce the novel concept of “focusing direction”—a geometric property of context-aware attention heads in the key/query activation space—and propose a label-free, dynamic attention enhancement method that exploits this directional structure without explicit supervision. Our approach integrates attention mechanism analysis, context-head identification, and directional modulation, enabling adaptive refinement of attention patterns. Evaluated across multiple long-context benchmarks, it significantly improves both attention focus and downstream task performance. This work uncovers an intrinsic geometric mechanism underlying task misalignment in long-context settings and establishes a new, interpretable, and fine-tuning-free paradigm for controllable attention.

Technology Category

Application Category

📝 Abstract
Long-context large language models (LLMs) are prone to be distracted by irrelevant contexts. The reason for distraction remains poorly understood. In this paper, we first identify the contextual heads, a special group of attention heads that control the overall attention of the LLM. Then, we demonstrate that distraction arises when contextual heads fail to allocate sufficient attention to relevant contexts and can be mitigated by increasing attention to these contexts. We further identify focus directions, located at the key and query activations of these heads, which enable them to allocate more attention to relevant contexts without explicitly specifying which context is relevant. We comprehensively evaluate the effect of focus direction on various long-context tasks and find out focus directions could help to mitigate the poor task alignment of the long-context LLMs. We believe our findings could promote further research on long-context LLM alignment.
Problem

Research questions and friction points this paper is trying to address.

LLMs distracted by irrelevant contexts in long-context tasks
Contextual heads fail to focus on relevant contexts
Focus directions improve attention alignment in long-context LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Identify contextual heads in LLMs
Use focus directions to allocate attention
Improve task alignment in long-context LLMs
🔎 Similar Papers
No similar papers found.