🤖 AI Summary
This work addresses the vulnerability of deep neural networks to adversarial examples by proposing a provably robust defense mechanism that leverages the non-uniform amplification of adversarial perturbations across network layers. By incorporating a tailored spectral loss function and a dedicated network architecture, the method enhances this amplification signal during training and enables lightweight detection at inference time. The study provides the first rigorous mathematical guarantees for adversarial noise amplification and demonstrates that the proposed approach effectively identifies adversarial inputs under a wide range of state-of-the-art and adaptive attacks. These results substantiate the reliability and practicality of exploiting layer-wise amplification signals to improve model robustness.
📝 Abstract
The nonuniform and growing impact of adversarial noise across the layers of deep neural networks has been used in the literature, without a formal mathematical justification, to detect adversarial inputs and improve robustness. In this work, we study this phenomenon in detail and present a formal adversarial noise amplification theorem. We specify a set of sufficient conditions under which the adversarial noise amplification is mathematically guaranteed. Based on theoretical observations, we propose a novel training methodology with a custom spectral loss function and a specific architectural design to enhance the amplification signal for detecting adversarial data. Finally, we introduce a new, lightweight detection mechanism that leverages the enhanced amplification signal and operates entirely at inference time. To validate our approach, we demonstrate the detector's efficacy against both state-of-the-art attacks and a purpose-built adaptive attack, confirming that enhanced amplification can serve as a robust and reliable signal for adversarial defense.