Pulling Back the Curtain: Unsupervised Adversarial Detection via Contrastive Auxiliary Networks

📅 2025-02-13

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Deep learning models are vulnerable to adversarial attacks in safety-critical applications, yet effective unsupervised detection methods—especially those not requiring adversarial examples for training—remain scarce. To address this, we propose U-CAN, an unsupervised adversarial detection framework that operates without access to adversarial samples during training. U-CAN embeds a lightweight contrastive auxiliary network into intermediate layers of the target model; guided by ArcFace, this network projects features into a contrastive space where class-discriminative boundaries are implicitly modeled, enabling robust identification of anomalous feature behaviors. The framework is plug-and-play and architecture-agnostic, supporting ResNet, VGG, and ViT. Evaluated on CIFAR-10, Mammals, and an ImageNet subset against four attack types (including FGSM and PGD), U-CAN achieves state-of-the-art F1 scores among unsupervised methods—marking the first instance of high-accuracy, cross-architecture adversarial detection without reliance on adversarial training data.

Technology Category

Application Category

📝 Abstract

Deep learning models are widely employed in safety-critical applications yet remain susceptible to adversarial attacks -- imperceptible perturbations that can significantly degrade model performance. Conventional defense mechanisms predominantly focus on either enhancing model robustness or detecting adversarial inputs independently. In this work, we propose an Unsupervised adversarial detection via Contrastive Auxiliary Networks (U-CAN) to uncover adversarial behavior within auxiliary feature representations, without the need for adversarial examples. U-CAN is embedded within selected intermediate layers of the target model. These auxiliary networks, comprising projection layers and ArcFace-based linear layers, refine feature representations to more effectively distinguish between benign and adversarial inputs. Comprehensive experiments across multiple datasets (CIFAR-10, Mammals, and a subset of ImageNet) and architectures (ResNet-50, VGG-16, and ViT) demonstrate that our method surpasses existing unsupervised adversarial detection techniques, achieving superior F1 scores against four distinct attack methods. The proposed framework provides a scalable and effective solution for enhancing the security and reliability of deep learning systems.

Problem

Research questions and friction points this paper is trying to address.

Detect adversarial attacks without adversarial examples

Enhance deep learning model security

Improve unsupervised adversarial detection accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised adversarial detection

Contrastive Auxiliary Networks

ArcFace-based linear layers

🔎 Similar Papers

A Survey of Defenses against AI-generated Visual Media: Detection, Disruption, and Authentication