Few Tokens Matter: Entropy Guided Attacks on Vision-Language Models

📅 2025-12-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies a critical vulnerability in vision-language models (VLMs) during autoregressive generation: only ~20% of high-entropy tokens govern output path stability, rendering them highly susceptible to adversarial manipulation. Method: We propose the first Entropy-Guided Attack (EGA) framework, which employs entropy-sensitive decoding analysis to model token-level uncertainty and enables selective, token-wise perturbation injection; EGA further establishes a cross-architecture transferable attack paradigm. Results: Experiments demonstrate that EGA achieves 35–49% success in converting benign VLM outputs into harmful content, 17–26% transferability across diverse VLM architectures, and an overall attack success rate of 93–95%. This work uncovers a fundamental link between generative entropy dynamics and VLM security, introducing the first entropy-aware, token-focused adversarial attack paradigm—establishing a new benchmark for evaluating and enhancing VLM robustness and safety.

Technology Category

Application Category

📝 Abstract
Vision-language models (VLMs) achieve remarkable performance but remain vulnerable to adversarial attacks. Entropy, a measure of model uncertainty, is strongly correlated with the reliability of VLM. Prior entropy-based attacks maximize uncertainty at all decoding steps, implicitly assuming that every token contributes equally to generation instability. We show instead that a small fraction (about 20%) of high-entropy tokens, i.e., critical decision points in autoregressive generation, disproportionately governs output trajectories. By concentrating adversarial perturbations on these positions, we achieve semantic degradation comparable to global methods while using substantially smaller budgets. More importantly, across multiple representative VLMs, such selective attacks convert 35-49% of benign outputs into harmful ones, exposing a more critical safety risk. Remarkably, these vulnerable high-entropy forks recur across architecturally diverse VLMs, enabling feasible transferability (17-26% harmful rates on unseen targets). Motivated by these findings, we propose Entropy-bank Guided Adversarial attacks (EGA), which achieves competitive attack success rates (93-95%) alongside high harmful conversion, thereby revealing new weaknesses in current VLM safety mechanisms.
Problem

Research questions and friction points this paper is trying to address.

Targets high-entropy tokens for efficient adversarial attacks
Concentrates perturbations on critical decision points in generation
Exposes safety risks in vision-language models via selective attacks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Targets high-entropy tokens for adversarial attacks
Uses entropy-bank guided perturbations for efficiency
Achieves high harmful conversion rates on VLMs
🔎 Similar Papers
No similar papers found.
M
Mengqi He
Australia National University
X
Xinyu Tian
Australian National University
X
Xin Shen
The University of Queensland
Jinhong Ni
Jinhong Ni
Australian National University
Generative Modeling
S
Shu Zou
Australian National University
Zhaoyuan Yang
Zhaoyuan Yang
GE Research
Machine LearningComputer VisionEdge ComputingRobotics
J
Jing Zhang
Australian National University