🤖 AI Summary
Existing adversarial patch attacks suffer from three key limitations: reliance on white-box assumptions, lack of target-class specificity, and poor visual realism—severely hindering practical deployment. This paper proposes the first context-aware, high-fidelity, targeted adversarial patch generation framework. Our method employs a semantic-guided U-Net architecture conditioned on target classes and built upon a conditional GAN, integrated with Grad-CAM–driven attention localization and end-to-end adversarial training to reliably mislead input images toward user-specified target classes. The approach simultaneously achieves strong black-box transferability, visual imperceptibility, and precise class-level controllability—overcoming the three major practicality bottlenecks. Extensive evaluations on mainstream CNNs and Vision Transformers demonstrate targeted attack success rates exceeding 99%, significantly outperforming prior white-box and untargeted methods. Our work establishes a new benchmark for realistic, controllable adversarial patch generation.
📝 Abstract
Adversarial patch attacks pose a severe threat to deep neural networks, yet most existing approaches rely on unrealistic white-box assumptions, untargeted objectives, or produce visually conspicuous patches that limit real-world applicability. In this work, we introduce a novel framework for fully controllable adversarial patch generation, where the attacker can freely choose both the input image x and the target class y target, thereby dictating the exact misclassification outcome. Our method combines a generative U-Net design with Grad-CAM-guided patch placement, enabling semantic-aware localization that maximizes attack effectiveness while preserving visual realism. Extensive experiments across convolutional networks (DenseNet-121, ResNet-50) and vision transformers (ViT-B/16, Swin-B/16, among others) demonstrate that our approach achieves state-of-the-art performance across all settings, with attack success rates (ASR) and target-class success (TCS) consistently exceeding 99%.
Importantly, we show that our method not only outperforms prior white-box attacks and untargeted baselines, but also surpasses existing non-realistic approaches that produce detectable artifacts. By simultaneously ensuring realism, targeted control, and black-box applicability-the three most challenging dimensions of patch-based attacks-our framework establishes a new benchmark for adversarial robustness research, bridging the gap between theoretical attack strength and practical stealthiness.