🤖 AI Summary
To address the limitations of existing adversarial example detection methods in generality, efficiency, and robustness, this paper proposes a lightweight detection framework based on inter-layer feature non-uniform perturbation modeling. Methodologically, it employs early-layer features to regressively predict deep-layer features and identifies adversarial perturbations by analyzing prediction residuals—without requiring retraining of the backbone model, thus enabling cross-architecture and cross-modal (image/video/audio) deployment. Its key contribution is the first explicit modeling of hierarchical feature dynamic deviations as regression residuals, enabling highly sensitive detection. Experiments demonstrate that the method achieves an average detection accuracy exceeding 95% against diverse state-of-the-art attacks, incurs less than 1% inference overhead relative to the main model, and operates in real time—significantly outperforming existing detectors.
📝 Abstract
Deep Neural Networks (DNNs) are notoriously vulnerable to adversarial input designs with limited noise budgets. While numerous successful attacks with subtle modifications to original input have been proposed, defense techniques against these attacks are relatively understudied. Existing defense approaches either focus on improving DNN robustness by negating the effects of perturbations or use a secondary model to detect adversarial data. Although equally important, the attack detection approach, which is studied in this work, provides a more practical defense compared to the robustness approach. We show that the existing detection methods are either ineffective against the state-of-the-art attack techniques or computationally inefficient for real-time processing. We propose a novel universal and efficient method to detect adversarial examples by analyzing the varying degrees of impact of attacks on different DNN layers. {Our method trains a lightweight regression model that predicts deeper-layer features from early-layer features, and uses the prediction error to detect adversarial samples.} Through theoretical arguments and extensive experiments, we demonstrate that our detection method is highly effective, computationally efficient for real-time processing, compatible with any DNN architecture, and applicable across different domains, such as image, video, and audio.