🤖 AI Summary
This work proposes a lightweight adversarial attack method inspired by interpretability principles, circumventing the high computational cost of traditional approaches that rely on gradients or extensive model queries. Drawing on classical edge detection concepts, the authors design a simple 3×3 convolutional image filter that generates non-targeted adversarial examples via a single forward pass—without requiring gradient computation or generative models. The proposed filter reduces parameter count by five orders of magnitude compared to existing techniques, while exhibiting structural similarities to classical image operators. Empirical evaluations demonstrate transferable attack success rates of 30%–80% across diverse models, substantially improving efficiency and offering new insights into the mechanisms underlying model vulnerability.
📝 Abstract
Adversarial examples in machine learning are typically generated using gradients, obtained either directly through access to the model or approximated via queries to it. In this paper, we propose a much simpler approach to craft adversarial examples, drawing inspiration from insights of explainable machine learning. In particular, we design \emph{adversarial image filters} that are based on classic edge detection algorithms but optimized to deceive learning models. The resulting untargeted attacks are transferable and require only a single pass over the input. Empirically, we find that 3x3 filters already enable success rates between 30% and 80% on different neural networks. Compared to related approaches using generative models for crafting adversarial examples, we reduce the number of parameters by five orders of magnitude, resulting in a very efficient attack. When investigating the parameters of the learned filters, we observe interesting properties such as a high transferability between models and structures common to classic image filters. Our results provide further insights into the vulnerability of neural networks and their fragility to malicious noise.