🤖 AI Summary
Deep neural networks exhibit insufficient robustness against adversarial perturbations, and existing adversarial training methods often degrade clean-sample accuracy. To address this trade-off, we propose Adv-DPNP—a novel framework that deeply integrates discriminative prototype learning with adversarial training. Specifically, class-wise prototypes serve jointly as classifier weights and robust geometric anchors in the latent space, explicitly modeling its intrinsic geometry. A dual-branch prototype update mechanism is introduced: prototypes are updated exclusively on clean data, while the feature extractor is optimized jointly on both clean and adversarial examples. The objective comprises a composite loss integrating positive prototype alignment, negative prototype repulsion, and consistency regularization. Evaluated on standard benchmarks, Adv-DPNP significantly improves clean accuracy while preserving strong adversarial robustness and generalization to common data corruptions, consistently outperforming state-of-the-art methods.
📝 Abstract
Deep neural networks demonstrate significant vulnerability to adversarial perturbations, posing risks for critical applications. Current adversarial training methods predominantly focus on robustness against attacks without explicitly leveraging geometric structures in the latent space, usually resulting in reduced accuracy on the original clean data. To address these issues, we propose a novel adversarial training framework named Adversarial Deep Positive-Negative Prototypes (Adv-DPNP), which integrates disriminative prototype-based learning with adversarial training. Adv-DPNP uses unified class prototypes serving dual roles as classifier weights and robust anchors, enhancing both intra-class compactness and inter-class separation in the latent space. Moreover, a novel dual-branch training mechanism maintains stable prototypes by updating them exclusively with clean data; while the feature extractor layers are learned using both clean and adversarial data to remain invariant against adversarial perturbations. In addition, our approach utilizes a composite loss function combining positive prototype alignment, negative prototype repulsion, and consistency regularization to further enhance discrimination, adversarial robustness, and clean accuracy. Extensive experiments conducted on standard benchmark datasets confirm the effectiveness of Adv-DPNP compared to state-of-the-art methods, achieving higher clean accuracy and competitive robustness under adversarial perturbations and common corruptions. Our code is available at https://github.com/fum-rpl/adv-dpnp