Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation

📅 2026-04-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the phenomenon of "attention concentration" in Transformer models, wherein a disproportionate amount of attention is allocated to uninformative or specific tokens, thereby undermining model interpretability, destabilizing training and inference, and exacerbating hallucination issues. The paper presents the first comprehensive survey of this phenomenon, introducing a three-dimensional classification framework—comprising foundational utilization, mechanistic explanation, and mitigation strategies—to systematically organize the evolving research landscape. By synthesizing recent findings on anomalous attention behaviors, the study constructs a structured knowledge base that clarifies core concepts and key challenges. It offers both theoretical insights and practical pathways for understanding and mitigating attention concentration, and further supports community advancement by releasing a curated list of relevant publications.

Technology Category

Application Category

📝 Abstract
As the foundational architecture of modern machine learning, Transformers have driven remarkable progress across diverse AI domains. Despite their transformative impact, a persistent challenge across various Transformers is Attention Sink (AS), in which a disproportionate amount of attention is focused on a small subset of specific yet uninformative tokens. AS complicates interpretability, significantly affecting the training and inference dynamics, and exacerbates issues such as hallucinations. In recent years, substantial research has been dedicated to understanding and harnessing AS. However, a comprehensive survey that systematically consolidates AS-related research and offers guidance for future advancements remains lacking. To address this gap, we present the first survey on AS, structured around three key dimensions that define the current research landscape: Fundamental Utilization, Mechanistic Interpretation, and Strategic Mitigation. Our work provides a pivotal contribution by clarifying key concepts and guiding researchers through the evolution and trends of the field. We envision this survey as a definitive resource, empowering researchers and practitioners to effectively manage AS within the current Transformer paradigm, while simultaneously inspiring innovative advancements for the next generation of Transformers. The paper list of this work is available at https://github.com/ZunhaiSu/Awesome-Attention-Sink.
Problem

Research questions and friction points this paper is trying to address.

Attention Sink
Transformers
Interpretability
Hallucinations
Attention Mechanism
Innovation

Methods, ideas, or system contributions that make the work stand out.

Attention Sink
Transformers
Interpretability
Attention Mechanism
Survey
🔎 Similar Papers
No similar papers found.
Z
Zunhai Su
Tsinghua University
Hengyuan Zhang
Hengyuan Zhang
Ph.D. Student, University of California San Diego
RoboticsComputer VisionAutonomous VehiclesSensor Fusion
W
Wei Wu
Meituan LongCat Team
Y
Yifan Zhang
Meituan LongCat Team
Y
Yaxiu Liu
Tsinghua University
H
He Xiao
The University of Hong Kong
Q
Qingyao Yang
The University of Hong Kong
Y
Yuxuan Sun
Meituan LongCat Team
R
Rui Yang
Meituan LongCat Team
Chao Zhang
Chao Zhang
Alibaba
K
Keyu Fan
Tsinghua University
W
Weihao Ye
Xiamen University
Jing Xiong
Jing Xiong
The University of Hong Kong
Natural Language ProcessingAutomated Theorem Proving
Hui Shen
Hui Shen
University of Michigan, Ph.D. Student in Computer Science (2025.9-?)
Efficient AIGenerative ModelMachine Learning System
Chaofan Tao
Chaofan Tao
The University of Hong Kong
Efficient MLNatural Language ProcessingMultimodal
Taiqiang Wu
Taiqiang Wu
University of Hong Kong | Tsinghua University
Model CompressionEfficient Methods
Zhongwei Wan
Zhongwei Wan
The Ohio State University, PhD student
LLMMultimodalNLP
Y
Yulei Qian
Meituan LongCat Team
Y
Yuchen Xie
Meituan LongCat Team
N
Ngai Wong
The University of Hong Kong