🤖 AI Summary
This work addresses the vulnerability of deep neural networks to adversarial attacks in image classification by proposing a pixel-level adversarial perturbation generation method that jointly enforces sparsity, low magnitude, and strong imperceptibility. We unify an ℓ₀-sparsity constraint, magnitude bounds, and perceptual consistency regularization into a single optimization framework, and—novelly—introduce the Frank–Wolfe (conditional gradient) algorithm to jointly optimize both perturbation support (location) and amplitude, with theoretical convergence rate O(1/√T). End-to-end adversarial examples generated on ImageNet exhibit significantly enhanced human imperceptibility and visual interpretability. Our method achieves higher attack success rates and superior sparsity efficiency compared to state-of-the-art sparse adversarial attack approaches.
📝 Abstract
Adversarial attacks hamper the decision-making ability of neural networks by perturbing the input signal. The addition of calculated small distortion to images, for instance, can deceive a well-trained image classification network. In this work, we propose a novel attack technique called Sparse Adversarial and Interpretable Attack Framework (SAIF). Specifically, we design imperceptible attacks that contain low-magnitude perturbations at a small number of pixels and leverage these sparse attacks to reveal the vulnerability of classifiers. We use the Frank-Wolfe (conditional gradient) algorithm to simultaneously optimize the attack perturbations for bounded magnitude and sparsity with $O(1/sqrt{T})$ convergence. Empirical results show that SAIF computes highly imperceptible and interpretable adversarial examples, and outperforms state-of-the-art sparse attack methods on the ImageNet dataset.