🤖 AI Summary
Existing adversarial attack detection methods rely heavily on handcrafted features and attack-specific priors, resulting in poor generalizability and high engineering overhead. To address these limitations, we propose a lightweight, attack-agnostic detection framework built upon the CLIP dual-encoder architecture. Our approach jointly fine-tunes learnable adapters and prompt embeddings to construct a compact, natural-image-specific representation space. Leveraging unsupervised or semi-supervised anomaly detection, it eliminates dependence on labeled adversarial examples or prior knowledge of attack types. Extensive experiments demonstrate significant performance gains under both known and unknown attacks, with minimal parameter count (<0.5M) and low training cost. The method establishes a new robust defense paradigm—hypothesis-free (i.e., no attack assumptions), highly generalizable across diverse threat models, and computationally efficient.
📝 Abstract
Adversarial attacks pose a critical security threat to real-world AI systems by injecting human-imperceptible perturbations into benign samples to induce misclassification in deep learning models. While existing detection methods, such as Bayesian uncertainty estimation and activation pattern analysis, have achieved progress through feature engineering, their reliance on handcrafted feature design and prior knowledge of attack patterns limits generalization capabilities and incurs high engineering costs. To address these limitations, this paper proposes a lightweight adversarial detection framework based on the large-scale pre-trained vision-language model CLIP. Departing from conventional adversarial feature characterization paradigms, we innovatively adopt an anomaly detection perspective. By jointly fine-tuning CLIP's dual visual-text encoders with trainable adapter networks and learnable prompts, we construct a compact representation space tailored for natural images. Notably, our detection architecture achieves substantial improvements in generalization capability across both known and unknown attack patterns compared to traditional methods, while significantly reducing training overhead. This study provides a novel technical pathway for establishing a parameter-efficient and attack-agnostic defense paradigm, markedly enhancing the robustness of vision systems against evolving adversarial threats.