🤖 AI Summary
This work addresses the challenge of robustness evaluation of deep neural networks under ℓ₀-norm adversarial attacks. To overcome the non-convexity and non-differentiability of the ℓ₀ norm, we propose the first differentiable, hyperparameter-free gradient-based optimization framework. Our method introduces a novel differentiable ℓ₀ approximation function and an adaptive gradient projection operator, coupled with a dynamic loss-sparsity trade-off mechanism to enable end-to-end minimal ℓ₀ perturbation search. The framework requires no manual hyperparameter tuning and significantly improves attack success rate, sparsity (i.e., smaller ℓ₀ perturbations), and computational efficiency. Extensive experiments on MNIST, CIFAR-10, and ImageNet demonstrate consistent superiority over existing sparse attack methods across all three core metrics—achieving state-of-the-art performance. Moreover, our ℓ₀ attacks uncover structural vulnerabilities in deep models that remain undetected by conventional ℓ₂- or ℓ∞-norm attacks.
📝 Abstract
Evaluating the adversarial robustness of deep networks to gradient-based attacks is challenging. While most attacks consider $ell_2$- and $ell_infty$-norm constraints to craft input perturbations, only a few investigate sparse $ell_1$- and $ell_0$-norm attacks. In particular, $ell_0$-norm attacks remain the least studied due to the inherent complexity of optimizing over a non-convex and non-differentiable constraint. However, evaluating adversarial robustness under these attacks could reveal weaknesses otherwise left untested with more conventional $ell_2$- and $ell_infty$-norm attacks. In this work, we propose a novel $ell_0$-norm attack, called $sigma$-zero, which leverages a differentiable approximation of the $ell_0$ norm to facilitate gradient-based optimization, and an adaptive projection operator to dynamically adjust the trade-off between loss minimization and perturbation sparsity. Extensive evaluations using MNIST, CIFAR10, and ImageNet datasets, involving robust and non-robust models, show that $sigma$ exttt{-zero} finds minimum $ell_0$-norm adversarial examples without requiring any time-consuming hyperparameter tuning, and that it outperforms all competing sparse attacks in terms of success rate, perturbation size, and efficiency.