AFTER: Mitigating the Object Hallucination of LVLM via Adaptive Factual-Guided Activation Editing

πŸ“… 2026-01-05
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Large vision-language models (LVLMs) are prone to language bias, which induces object hallucinations at the levels of categories, attributes, and relationships, thereby undermining trustworthy AI applications. To address this issue, this work proposes the first two-stage framework that explicitly integrates factual semantics into activation editing. The approach first aligns vision–text semantics through Fact-Augmented Activation Steering (FAS) and then dynamically refines internal activations via Query-Adaptive Offset Optimization (QAO), enabling fine-grained and adaptive fact-guided correction. Evaluated on standard hallucination benchmarks such as AMBER, the method reduces hallucination rates by up to 16.3% across multiple mainstream LVLMs, significantly outperforming existing baselines.

Technology Category

Application Category

πŸ“ Abstract
Large Vision-Language Models (LVLMs) have achieved substantial progress in cross-modal tasks. However, due to language bias, LVLMs are susceptible to object hallucination, which can be primarily divided into category, attribute, and relation hallucination, significantly impeding the trustworthy AI applications. Editing the internal activations of LVLMs has shown promising effectiveness in mitigating hallucinations with minimal cost. However, previous editing approaches neglect the effective guidance offered by factual textual semantics, thereby struggling to explicitly mitigate language bias. To address these issues, we propose Adaptive Factual-guided Visual-Textual Editing for hallucination mitigation (AFTER), which comprises Factual-Augmented Activation Steering (FAS) and Query-Adaptive Offset Optimization (QAO), to adaptively guides the original biased activations towards factual semantics. Specifically, FAS is proposed to provide factual and general guidance for activation editing, thereby explicitly modeling the precise visual-textual associations. Subsequently, QAO introduces a query-aware offset estimator to establish query-specific editing from the general steering vector, enhancing the diversity and granularity of editing. Extensive experiments on standard hallucination benchmarks across three widely adopted LVLMs validate the efficacy of the proposed AFTER, notably achieving up to a 16.3% reduction of hallucination over baseline on the AMBER benchmark. Our code and data will be released for reproducibility.
Problem

Research questions and friction points this paper is trying to address.

object hallucination
language bias
vision-language models
factual semantics
trustworthy AI
Innovation

Methods, ideas, or system contributions that make the work stand out.

activation editing
object hallucination
factual guidance
vision-language models
adaptive steering
πŸ”Ž Similar Papers
No similar papers found.
T
Tianbo Wang
School of Computer Science and Engineering, Beihang University
Y
Yuqing Ma
Institute of Artificial Intelligence, Beihang University
K
Kewei Liao
School of Computer Science and Engineering, Beihang University
Z
Zhange Zhang
Institute of Artificial Intelligence, Beihang University
Simin Li
Simin Li
Beihang University
Reinforcement LearningMulti-Agent LearningAdversarial attackTrustworthy AI
Jinyang Guo
Jinyang Guo
The University of Sydney
Deep LearningEfficient MethodsEdge Computing
X
Xianglong Liu
School of Computer Science and Engineering, Beihang University