Vocabulary Hijacking in LVLMs: Unveiling Critical Attention Heads by Excluding Inert Tokens to Mitigate Hallucination

📅 2026-05-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

217K/year
🤖 AI Summary
This work addresses the hallucination problem in large vision-language models (LVLMs) caused by visual-linguistic misalignment during text generation. It identifies, for the first time, a lexical hijacking phenomenon induced by “Inert Tokens” and characterizes their semantic rigidity. To mitigate this issue without requiring additional training, the authors propose HAVAE—a training-free intervention that enhances the attention of critical attention heads toward salient visual content. They introduce the Hijacking Anchor-Based Identification (HABI) strategy, grounded in the logit lens framework, and the Non-Hijacked Visual Attention Ratio (NHAR) as a new evaluation metric. Experiments demonstrate that HAVAE significantly reduces hallucination rates across multiple benchmarks while incurring no extra computational overhead and preserving the model’s general capabilities.
📝 Abstract
Large Vision-Language Models (LVLMs) have achieved remarkable progress in multimodal tasks, yet their reliability is persistently undermined by hallucinations-generating text that contradicts visual input. Recent studies often attribute these errors to inadequate visual attention. In this work, we analyze the attention mechanisms via the logit lens, uncovering a distinct anomaly we term Vocabulary Hijacking. We discover that specific visual tokens, defined as Inert Tokens, disproportionately attract attention. Crucially, when their intermediate hidden states are projected into the vocabulary space, they consistently decode to a fixed set of unrelated words (termed Hijacking Anchors) across layers, revealing a rigid semantic collapse. Leveraging this semantic rigidity, we propose Hijacking Anchor-Based Identification (HABI), a robust strategy to accurately localize these Inert Tokens. To quantify the impact of this phenomenon, we introduce the Non-Hijacked Visual Attention Ratio (NHAR), a novel metric designed to identify attention heads that remain resilient to hijacking and are critical for factual accuracy. Building on these insights, we propose Hijacking-Aware Visual Attention Enhancement (HAVAE), a training-free intervention that selectively strengthens the focus of these identified heads on salient visual content. Extensive experiments across multiple benchmarks demonstrate that HAVAE significantly mitigates hallucinations with no additional computational overhead, while preserving the model's general capabilities. Our code is publicly available at https://github.com/lab-klc/HAVAE.
Problem

Research questions and friction points this paper is trying to address.

hallucination
Large Vision-Language Models
visual attention
Vocabulary Hijacking
Inert Tokens
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vocabulary Hijacking
Inert Tokens
Hijacking Anchor-Based Identification
Non-Hijacked Visual Attention Ratio
HAVAE
🔎 Similar Papers
2024-10-06Conference on Empirical Methods in Natural Language ProcessingCitations: 33