Few Tokens Matter: Entropy Guided Attacks on Vision-Language Models

📅 2025-12-25

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work identifies a critical vulnerability in vision-language models (VLMs) during autoregressive generation: only ~20% of high-entropy tokens govern output path stability, rendering them highly susceptible to adversarial manipulation. Method: We propose the first Entropy-Guided Attack (EGA) framework, which employs entropy-sensitive decoding analysis to model token-level uncertainty and enables selective, token-wise perturbation injection; EGA further establishes a cross-architecture transferable attack paradigm. Results: Experiments demonstrate that EGA achieves 35–49% success in converting benign VLM outputs into harmful content, 17–26% transferability across diverse VLM architectures, and an overall attack success rate of 93–95%. This work uncovers a fundamental link between generative entropy dynamics and VLM security, introducing the first entropy-aware, token-focused adversarial attack paradigm—establishing a new benchmark for evaluating and enhancing VLM robustness and safety.

Technology Category

Application Category

📝 Abstract

Vision-language models (VLMs) achieve remarkable performance but remain vulnerable to adversarial attacks. Entropy, a measure of model uncertainty, is strongly correlated with the reliability of VLM. Prior entropy-based attacks maximize uncertainty at all decoding steps, implicitly assuming that every token contributes equally to generation instability. We show instead that a small fraction (about 20%) of high-entropy tokens, i.e., critical decision points in autoregressive generation, disproportionately governs output trajectories. By concentrating adversarial perturbations on these positions, we achieve semantic degradation comparable to global methods while using substantially smaller budgets. More importantly, across multiple representative VLMs, such selective attacks convert 35-49% of benign outputs into harmful ones, exposing a more critical safety risk. Remarkably, these vulnerable high-entropy forks recur across architecturally diverse VLMs, enabling feasible transferability (17-26% harmful rates on unseen targets). Motivated by these findings, we propose Entropy-bank Guided Adversarial attacks (EGA), which achieves competitive attack success rates (93-95%) alongside high harmful conversion, thereby revealing new weaknesses in current VLM safety mechanisms.

Problem

Research questions and friction points this paper is trying to address.

Targets high-entropy tokens for efficient adversarial attacks

Concentrates perturbations on critical decision points in generation

Exposes safety risks in vision-language models via selective attacks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Targets high-entropy tokens for adversarial attacks

Uses entropy-bank guided perturbations for efficiency

Achieves high harmful conversion rates on VLMs

🔎 Similar Papers

ImgTrojan: Jailbreaking Vision-Language Models with ONE Image