SAIF: Sparse Adversarial and Imperceptible Attack Framework

📅 2022-12-14

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

This work addresses the vulnerability of deep neural networks to adversarial attacks in image classification by proposing a pixel-level adversarial perturbation generation method that jointly enforces sparsity, low magnitude, and strong imperceptibility. We unify an ℓ₀-sparsity constraint, magnitude bounds, and perceptual consistency regularization into a single optimization framework, and—novelly—introduce the Frank–Wolfe (conditional gradient) algorithm to jointly optimize both perturbation support (location) and amplitude, with theoretical convergence rate O(1/√T). End-to-end adversarial examples generated on ImageNet exhibit significantly enhanced human imperceptibility and visual interpretability. Our method achieves higher attack success rates and superior sparsity efficiency compared to state-of-the-art sparse adversarial attack approaches.

📝 Abstract

Adversarial attacks hamper the decision-making ability of neural networks by perturbing the input signal. The addition of calculated small distortion to images, for instance, can deceive a well-trained image classification network. In this work, we propose a novel attack technique called Sparse Adversarial and Interpretable Attack Framework (SAIF). Specifically, we design imperceptible attacks that contain low-magnitude perturbations at a small number of pixels and leverage these sparse attacks to reveal the vulnerability of classifiers. We use the Frank-Wolfe (conditional gradient) algorithm to simultaneously optimize the attack perturbations for bounded magnitude and sparsity with $O(1/sqrt{T})$ convergence. Empirical results show that SAIF computes highly imperceptible and interpretable adversarial examples, and outperforms state-of-the-art sparse attack methods on the ImageNet dataset.

Problem

Research questions and friction points this paper is trying to address.

Develop sparse adversarial attacks on neural networks

Optimize attack perturbations for imperceptibility and sparsity

Reveal classifier vulnerabilities via interpretable adversarial examples

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse low-magnitude pixel perturbations attack

Frank-Wolfe algorithm optimizes sparsity and magnitude

Generates imperceptible interpretable adversarial examples

🔎 Similar Papers

No similar papers found.