NAT: Learning to Attack Neurons for Enhanced Adversarial Transferability

📅 2025-08-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the weak transferability of adversarial examples, this paper proposes NAT (Neuron Attack for Transferability), a neuron-level attack method. Unlike conventional embedding-layer perturbations, NAT is the first to target a single critical neuron in an intermediate layer of the source model, optimizing its activation discrepancy via a generator to craft localized perturbations with enhanced representational disruption—thereby strengthening cross-model transferability. NAT employs a multi-generator ensemble strategy, enabling efficient low-query (≤10 queries) black-box attacks. Extensive evaluation across 41 ImageNet models and 9 fine-grained classification models demonstrates that NAT improves cross-model attack success rates by over 14% and cross-domain success rates by 4%, significantly outperforming state-of-the-art methods.

Technology Category

Application Category

📝 Abstract
The generation of transferable adversarial perturbations typically involves training a generator to maximize embedding separation between clean and adversarial images at a single mid-layer of a source model. In this work, we build on this approach and introduce Neuron Attack for Transferability (NAT), a method designed to target specific neuron within the embedding. Our approach is motivated by the observation that previous layer-level optimizations often disproportionately focus on a few neurons representing similar concepts, leaving other neurons within the attacked layer minimally affected. NAT shifts the focus from embedding-level separation to a more fundamental, neuron-specific approach. We find that targeting individual neurons effectively disrupts the core units of the neural network, providing a common basis for transferability across different models. Through extensive experiments on 41 diverse ImageNet models and 9 fine-grained models, NAT achieves fooling rates that surpass existing baselines by over 14% in cross-model and 4% in cross-domain settings. Furthermore, by leveraging the complementary attacking capabilities of the trained generators, we achieve impressive fooling rates within just 10 queries. Our code is available at: https://krishnakanthnakka.github.io/NAT/
Problem

Research questions and friction points this paper is trying to address.

Targeting specific neurons to enhance adversarial attack transferability
Overcoming layer-level optimization limitations in neural network attacks
Disrupting core neural units for improved cross-model adversarial performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Targets specific neurons for adversarial attacks
Disrupts core neural network units directly
Enhances transferability across diverse models
🔎 Similar Papers
No similar papers found.