🤖 AI Summary
This work addresses the prevalent issue of object hallucination in large vision-language models (LVLMs), which undermines their reliability in real-world applications. The authors propose a training-free, hierarchical adaptive weight editing method that introduces, for the first time, a Hallucination Insensitivity Score (HIS) to guide precise identification and intervention on decoder layers most susceptible to hallucination. By selectively adjusting these layers, the approach effectively suppresses hallucinatory outputs while preserving the model’s pretrained knowledge. Compatible with mainstream architectures such as Qwen, LLaMA, and Vicuna, the method reduces hallucination rates by an average of 61.8% across multiple benchmarks—including CHAIR, MME, and GPT-4V–assisted evaluations—without introducing additional parameters, inference latency, or computational overhead.
📝 Abstract
Large Vision-Language Models (LVLMs) have demonstrated impressive multimodal understanding capabilities, yet they remain prone to object hallucination, where models describe non-existent objects or attribute incorrect factual information, raising serious concerns for reliable real-world deployment. While fine-tuning is a commonly adopted mitigation strategy, its high computational cost and practical difficulty motivate the need for training-free alternatives, among which model editing has recently emerged as a promising direction. However, indiscriminate editing risks disrupting the rich implicit knowledge encoded in pre-trained LVLMs, leading to a fundamental question: how much intervention is necessary at each layer to suppress hallucinations while preserving pre-trained knowledge? To address this question, we present a systematic analysis of LVLM decoders built on three widely used large language model backbones-Qwen, LLaMA, and Vicuna-revealing clear layer-wise differences in susceptibility to object hallucination. Building on these insights, we introduce the Hallucination Insensitivity Score (HIS), a principled metric that quantifies each layer's sensitivity to hallucination and provides guidance for targeted intervention. Leveraging HIS, we propose Hallucination Insensitivity Model Editing (HIME), a simple yet effective layer-adaptive weight editing approach that selectively modifies latent features to suppress hallucinations while preserving pre-trained knowledge. Extensive experiments demonstrate that HIME reduces hallucinations by an average of 61.8% across open-ended generation benchmarks, including CHAIR, MME, and GPT-4V-aided evaluation, without introducing additional parameters, inference-time latency, or computational overhead.