🤖 AI Summary
This work addresses the vulnerability of large vision-language models (LVLMs) to pixel-level adversarial perturbations, a challenge inadequately mitigated by existing defenses that either neglect cross-modal alignment or incur high computational costs and excessive image modification. To overcome these limitations, the authors propose SIGN, a novel defense framework that, for the first time, incorporates structural priors into LVLM protection. SIGN leverages prior structure extraction and a dynamic guidance neutralization mechanism to effectively suppress adversarial perturbations without model retraining or altering the original visual representations. The framework is lightweight and plug-and-play, requiring only 0.5% pixel modification and 0.16 seconds per image, achieving a defense success rate exceeding 87% while preserving near-original performance on benign tasks.
📝 Abstract
Image inputs enable Large Vision Language Models (LVLMs) to perceive fine-grained visual information, but also introduce a pixel-level attack surface through which adversarial perturbations can elicit unsafe model behaviors. However, most existing defenses are designed for traditional computer vision settings and thus often overlook the cross-modal alignment required by LVLMs, leading to degraded performance. Meanwhile, the limited defenses tailored to LVLMs often require substantial image modifications and introduce considerable computational overhead, thereby compromising inference quality and efficiency. To address these limitations, we propose Structure-Induced Guided Neutralization (SIGN), a lightweight, plug-and-play defense framework that improves LVLM compatibility via Prior Structural Extraction and achieves efficient perturbation suppression via Dynamic Guided Neutralization. Extensive experiments show that SIGN achieves over 87\% defense success rate with only 0.5\% pixel modification and 0.16 seconds per image, while nearly preserving original visual representations and benign task performance. Our work offers a lightweight alternative to defenses that require costly model training and highlights the potential of exploiting a vision encoder for efficient adversarial protection. Our code is open source on https://anonymous.4open.science/r/SIGN-BCB1.