🤖 AI Summary
This work addresses the susceptibility of large language models to attention drift during generation, which often leads to hallucinations and context forgetting. The authors propose SinkTrack, a novel, training-free, plug-and-play context anchoring mechanism that, for the first time, repurposes the initial beginning-of-sequence (<BOS>) token as a programmable information anchor. By injecting key contextual features into this anchor through attention-aware representation modification, SinkTrack effectively stabilizes context retention without requiring model retraining. The method is grounded in attention analysis and is compatible with both multimodal and text-only tasks. Experimental results demonstrate consistent performance gains across diverse model architectures and scales, with accuracy improvements of 21.6% on SQuAD2.0 and 22.8% on M3CoT.
📝 Abstract
Large language models (LLMs) suffer from hallucination and context forgetting. Prior studies suggest that attention drift is a primary cause of these problems, where LLMs' focus shifts towards newly generated tokens and away from the initial input context. To counteract this, we make use of a related, intrinsic characteristic of LLMs: attention sink -- the tendency to consistently allocate high attention to the very first token (i.e., <BOS>) of a sequence. Concretely, we propose an advanced context anchoring method, SinkTrack, which treats <BOS> as an information anchor and injects key contextual features (such as those derived from the input image or instruction) into its representation. As such, LLM remains anchored to the initial input context throughout the entire generation process. SinkTrack is training-free, plug-and-play, and introduces negligible inference overhead. Experiments demonstrate that SinkTrack mitigates hallucination and context forgetting across both textual (e.g., +21.6% on SQuAD2.0 with Llama3.1-8B-Instruct) and multi-modal (e.g., +22.8% on M3CoT with Qwen2.5-VL-7B-Instruct) tasks. Its consistent gains across different architectures and scales underscore the robustness and generalizability. We also analyze its underlying working mechanism from the perspective of information delivery. Our source code is available at https://github.com/67L1/SinkTrack.