🤖 AI Summary
This work addresses the high computational cost and performance degradation of existing backdoor attacks on Vision Transformers (ViTs) based on full fine-tuning, as well as the unclear security risks associated with parameter-efficient fine-tuning (PEFT) methods employing dynamic prompting. To this end, we propose VIPER, a novel attack framework that leverages a lightweight dynamic Visual Prompt Generator (VPG) to embed highly stealthy backdoors while preserving state-of-the-art performance on benign tasks. Our study reveals, for the first time, a “functional fusion” phenomenon in dynamic prompts, wherein malicious logic and benign functionality share a sparse set of high-magnitude core parameters, rendering the backdoor extremely difficult to remove without compromising model performance. Experiments demonstrate that even when the VPG is pruned by 90%, the attack success rate remains nearly 100%, with only a marginal inference latency increase of 0.06 ms (1.16%).
📝 Abstract
Existing ViT backdoor attacks based on backbone-overwriting full-tuning are computationally expensive and inflict performance degradation. This has forced adversaries towards the Visual Parameter-Efficient Fine-Tuning (PEFT) paradigm, dominated by adapter-based (e.g., LoRA) and prompt-based (e.g., VPT) approaches. While adapter security has seen initial study, the risks of the burgeoning prompt-based ecosystem remain critically unexplored. We fill this critical gap, exposing how the evolution of VPT towards dynamic and context-aware architectures can facilitate a far more dangerous and emergent threat. This vulnerability arises even though these dynamic modules unlock superior benign performance. We propose VIPER, an attack framework built on a lightweight, dynamic Visual Prompt Generator (VPG) that demonstrates this vulnerability. Critically, this dynamic architecture enables Functional Fusion: an emergent phenomenon where malicious logic and benign task utility are tightly fused into the same sparse, high-magnitude parameter core. This fusion creates a formidable ``hostage" dilemma, as pruning the attack necessarily destroys the benign performance. Comprehensive evaluations show VIPER effectively addresses the attacker's trilemma: VIPER not only achieves state-of-the-art performance on clean data, but also maintains near-100% ASR even under 90% VPG-module pruning (where LoRA attacks collapse), while adding only an imperceptible 0.06ms (1.16%) of inference latency. VIPER's results, driven by Functional Fusion, expose a new, paradigm-level risk in dynamic prompt architectures.