Detecting Adversarial Examples

📅 2024-10-22

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

🤖 AI Summary

Adversarial examples severely compromise the robustness of deep neural networks (DNNs), yet existing defenses either rely on attack-specific priors or require architectural modifications, suffering from poor generalizability and computational overhead. This paper proposes a universal, lightweight adversarial sample detection framework that requires no model fine-tuning or additional training. Leveraging statistical distribution analysis of layer-wise activations—including gradient sensitivity modeling and anomaly detection—it enables plug-and-play detection across diverse architectures and modalities (image, video, audio). Our key contribution is the first statistically grounded, assumption-free, and interpretable detection paradigm, derived purely from intrinsic activation statistics, eliminating dependence on prior knowledge of attack types. Extensive evaluation demonstrates >95% detection accuracy across multiple datasets and attack settings, with inference overhead under 0.5% of the original model’s computational cost—significantly outperforming state-of-the-art methods.

Technology Category

Application Category

📝 Abstract

Deep Neural Networks (DNNs) have been shown to be vulnerable to adversarial examples. While numerous successful adversarial attacks have been proposed, defenses against these attacks remain relatively understudied. Existing defense approaches either focus on negating the effects of perturbations caused by the attacks to restore the DNNs' original predictions or use a secondary model to detect adversarial examples. However, these methods often become ineffective due to the continuous advancements in attack techniques. We propose a novel universal and lightweight method to detect adversarial examples by analyzing the layer outputs of DNNs. Through theoretical justification and extensive experiments, we demonstrate that our detection method is highly effective, compatible with any DNN architecture, and applicable across different domains, such as image, video, and audio.

Problem

Research questions and friction points this paper is trying to address.

Detecting adversarial examples in Deep Neural Networks

Overcoming limitations of existing defense methods against attacks

Developing a universal lightweight detection approach for multiple domains

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes DNN layer outputs for detection

Uses lightweight regression model for prediction

Detects adversarial samples via prediction error

🔎 Similar Papers

No similar papers found.

Authors to Follow