AFTER: Mitigating the Object Hallucination of LVLM via Adaptive Factual-Guided Activation Editing

📅 2026-01-05

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Large vision-language models (LVLMs) are prone to language bias, which induces object hallucinations at the levels of categories, attributes, and relationships, thereby undermining trustworthy AI applications. To address this issue, this work proposes the first two-stage framework that explicitly integrates factual semantics into activation editing. The approach first aligns vision–text semantics through Fact-Augmented Activation Steering (FAS) and then dynamically refines internal activations via Query-Adaptive Offset Optimization (QAO), enabling fine-grained and adaptive fact-guided correction. Evaluated on standard hallucination benchmarks such as AMBER, the method reduces hallucination rates by up to 16.3% across multiple mainstream LVLMs, significantly outperforming existing baselines.

Technology Category

Application Category

📝 Abstract

Large Vision-Language Models (LVLMs) have achieved substantial progress in cross-modal tasks. However, due to language bias, LVLMs are susceptible to object hallucination, which can be primarily divided into category, attribute, and relation hallucination, significantly impeding the trustworthy AI applications. Editing the internal activations of LVLMs has shown promising effectiveness in mitigating hallucinations with minimal cost. However, previous editing approaches neglect the effective guidance offered by factual textual semantics, thereby struggling to explicitly mitigate language bias. To address these issues, we propose Adaptive Factual-guided Visual-Textual Editing for hallucination mitigation (AFTER), which comprises Factual-Augmented Activation Steering (FAS) and Query-Adaptive Offset Optimization (QAO), to adaptively guides the original biased activations towards factual semantics. Specifically, FAS is proposed to provide factual and general guidance for activation editing, thereby explicitly modeling the precise visual-textual associations. Subsequently, QAO introduces a query-aware offset estimator to establish query-specific editing from the general steering vector, enhancing the diversity and granularity of editing. Extensive experiments on standard hallucination benchmarks across three widely adopted LVLMs validate the efficacy of the proposed AFTER, notably achieving up to a 16.3% reduction of hallucination over baseline on the AMBER benchmark. Our code and data will be released for reproducibility.

Problem

Research questions and friction points this paper is trying to address.

object hallucination

language bias

vision-language models

factual semantics

trustworthy AI

Innovation

Methods, ideas, or system contributions that make the work stand out.

activation editing

object hallucination

factual guidance