🤖 AI Summary
This work identifies a critical vulnerability in vision-language models (VLMs) during autoregressive generation: only ~20% of high-entropy tokens govern output path stability, rendering them highly susceptible to adversarial manipulation. Method: We propose the first Entropy-Guided Attack (EGA) framework, which employs entropy-sensitive decoding analysis to model token-level uncertainty and enables selective, token-wise perturbation injection; EGA further establishes a cross-architecture transferable attack paradigm. Results: Experiments demonstrate that EGA achieves 35–49% success in converting benign VLM outputs into harmful content, 17–26% transferability across diverse VLM architectures, and an overall attack success rate of 93–95%. This work uncovers a fundamental link between generative entropy dynamics and VLM security, introducing the first entropy-aware, token-focused adversarial attack paradigm—establishing a new benchmark for evaluating and enhancing VLM robustness and safety.
📝 Abstract
Vision-language models (VLMs) achieve remarkable performance but remain vulnerable to adversarial attacks. Entropy, a measure of model uncertainty, is strongly correlated with the reliability of VLM. Prior entropy-based attacks maximize uncertainty at all decoding steps, implicitly assuming that every token contributes equally to generation instability. We show instead that a small fraction (about 20%) of high-entropy tokens, i.e., critical decision points in autoregressive generation, disproportionately governs output trajectories. By concentrating adversarial perturbations on these positions, we achieve semantic degradation comparable to global methods while using substantially smaller budgets. More importantly, across multiple representative VLMs, such selective attacks convert 35-49% of benign outputs into harmful ones, exposing a more critical safety risk. Remarkably, these vulnerable high-entropy forks recur across architecturally diverse VLMs, enabling feasible transferability (17-26% harmful rates on unseen targets). Motivated by these findings, we propose Entropy-bank Guided Adversarial attacks (EGA), which achieves competitive attack success rates (93-95%) alongside high harmful conversion, thereby revealing new weaknesses in current VLM safety mechanisms.