SAIF: Sparse Adversarial and Imperceptible Attack Framework

📅 2022-12-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the vulnerability of deep neural networks to adversarial attacks in image classification by proposing a pixel-level adversarial perturbation generation method that jointly enforces sparsity, low magnitude, and strong imperceptibility. We unify an ℓ₀-sparsity constraint, magnitude bounds, and perceptual consistency regularization into a single optimization framework, and—novelly—introduce the Frank–Wolfe (conditional gradient) algorithm to jointly optimize both perturbation support (location) and amplitude, with theoretical convergence rate O(1/√T). End-to-end adversarial examples generated on ImageNet exhibit significantly enhanced human imperceptibility and visual interpretability. Our method achieves higher attack success rates and superior sparsity efficiency compared to state-of-the-art sparse adversarial attack approaches.
📝 Abstract
Adversarial attacks hamper the decision-making ability of neural networks by perturbing the input signal. The addition of calculated small distortion to images, for instance, can deceive a well-trained image classification network. In this work, we propose a novel attack technique called Sparse Adversarial and Interpretable Attack Framework (SAIF). Specifically, we design imperceptible attacks that contain low-magnitude perturbations at a small number of pixels and leverage these sparse attacks to reveal the vulnerability of classifiers. We use the Frank-Wolfe (conditional gradient) algorithm to simultaneously optimize the attack perturbations for bounded magnitude and sparsity with $O(1/sqrt{T})$ convergence. Empirical results show that SAIF computes highly imperceptible and interpretable adversarial examples, and outperforms state-of-the-art sparse attack methods on the ImageNet dataset.
Problem

Research questions and friction points this paper is trying to address.

Develop sparse adversarial attacks on neural networks
Optimize attack perturbations for imperceptibility and sparsity
Reveal classifier vulnerabilities via interpretable adversarial examples
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse low-magnitude pixel perturbations attack
Frank-Wolfe algorithm optimizes sparsity and magnitude
Generates imperceptible interpretable adversarial examples
🔎 Similar Papers
No similar papers found.
Tooba Imtiaz
Tooba Imtiaz
PhD Candidate, Northeastern University
Computer VisionDeep LearningAdversarial Examples
M
Morgan Kohler
Department of Electrical & Computer Engineering, Northeastern University, Boston MA.
Jared Miller
Jared Miller
Department of Electrical & Computer Engineering, Northeastern University, Boston MA.
Z
Zifeng Wang
Department of Electrical & Computer Engineering, Northeastern University, Boston MA.
M
M. Sznaier
Department of Electrical & Computer Engineering, Northeastern University, Boston MA.
O
O. Camps
Department of Electrical & Computer Engineering, Northeastern University, Boston MA.
J
Jennifer G. Dy
Department of Electrical & Computer Engineering, Northeastern University, Boston MA.