🤖 AI Summary
This work addresses the issue of hallucinations in current vision-language models caused by object-hiding attacks that induce semantic discontinuities, leading models to generate plausible yet incorrect content. The authors propose a background-consistent object-hiding method that re-encodes target regions to align statistically and semantically with their surrounding context, thereby avoiding representational voids that trigger hallucinations. Notably, this study is the first to demonstrate that such hallucinations stem from semantic inconsistency rather than mere object absence, and introduces a hallucination-free hiding mechanism that preserves token structure and attention flow. Leveraging a pixel-level optimization framework, the method enforces background-consistent re-encoding across multiple Transformer layers while maintaining global scene semantics. Experiments show that the approach effectively hides targets in mainstream vision-language models, retains up to 86% of non-target objects, and reduces localization-based hallucinations to one-third of those produced by existing methods.
📝 Abstract
Vision-language models (VLMs) have recently shown remarkable capabilities in visual understanding and generation, but remain vulnerable to adversarial manipulations of visual content. Prior object-hiding attacks primarily rely on suppressing or blocking region-specific representations, often creating semantic gaps that inadvertently induce hallucination, where models invent plausible but incorrect objects. In this work, we demonstrate that hallucination arises not from object absence per se, but from semantic discontinuity introduced by such suppression-based attacks. We propose a new class of \emph{background-consistent object concealment} attacks, which hide target objects by re-encoding their visual representations to be statistically and semantically consistent with surrounding background regions. Crucially, our approach preserves token structure and attention flow, avoiding representational voids that trigger hallucination. We present a pixel-level optimization framework that enforces background-consistent re-encoding across multiple transformer layers while preserving global scene semantics. Extensive experiments on state-of-the-art vision-language models show that our method effectively conceals target objects while preserving up to $86\%$ of non-target objects and reducing grounded hallucination by up to $3\times$ compared to attention-suppression-based attacks.