🤖 AI Summary
To address the weak transferability of adversarial examples, this paper proposes NAT (Neuron Attack for Transferability), a neuron-level attack method. Unlike conventional embedding-layer perturbations, NAT is the first to target a single critical neuron in an intermediate layer of the source model, optimizing its activation discrepancy via a generator to craft localized perturbations with enhanced representational disruption—thereby strengthening cross-model transferability. NAT employs a multi-generator ensemble strategy, enabling efficient low-query (≤10 queries) black-box attacks. Extensive evaluation across 41 ImageNet models and 9 fine-grained classification models demonstrates that NAT improves cross-model attack success rates by over 14% and cross-domain success rates by 4%, significantly outperforming state-of-the-art methods.
📝 Abstract
The generation of transferable adversarial perturbations typically involves training a generator to maximize embedding separation between clean and adversarial images at a single mid-layer of a source model. In this work, we build on this approach and introduce Neuron Attack for Transferability (NAT), a method designed to target specific neuron within the embedding. Our approach is motivated by the observation that previous layer-level optimizations often disproportionately focus on a few neurons representing similar concepts, leaving other neurons within the attacked layer minimally affected. NAT shifts the focus from embedding-level separation to a more fundamental, neuron-specific approach. We find that targeting individual neurons effectively disrupts the core units of the neural network, providing a common basis for transferability across different models. Through extensive experiments on 41 diverse ImageNet models and 9 fine-grained models, NAT achieves fooling rates that surpass existing baselines by over 14% in cross-model and 4% in cross-domain settings. Furthermore, by leveraging the complementary attacking capabilities of the trained generators, we achieve impressive fooling rates within just 10 queries. Our code is available at: https://krishnakanthnakka.github.io/NAT/