GradID: Adversarial Detection via Intrinsic Dimensionality of Gradients

📅 2025-12-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deep neural networks are vulnerable to imperceptible adversarial perturbations, posing critical safety risks in high-stakes applications such as medical diagnosis and autonomous driving. To address this, we propose a geometric discrimination method based on the intrinsic dimensionality (ID) of gradient manifolds—revealing, for the first time, stable dimensional disparities between natural and adversarial samples in gradient space. Our approach establishes the first general-purpose adversarial detection paradigm grounded in the geometric structure of gradient manifolds. It operates without model modification or retraining and supports both single-sample and batch-mode deployment. On CIFAR-10 and MS COCO, it sets new state-of-the-art performance for single-sample detection, achieving consistently >92% detection rates against strong attacks including CW and AutoAttack. In batch-mode evaluation on MNIST and SVHN, it attains optimal performance among existing methods.

Technology Category

Application Category

📝 Abstract
Despite their remarkable performance, deep neural networks exhibit a critical vulnerability: small, often imperceptible, adversarial perturbations can lead to drastically altered model predictions. Given the stringent reliability demands of applications such as medical diagnosis and autonomous driving, robust detection of such adversarial attacks is paramount. In this paper, we investigate the geometric properties of a model's input loss landscape. We analyze the Intrinsic Dimensionality (ID) of the model's gradient parameters, which quantifies the minimal number of coordinates required to describe the data points on their underlying manifold. We reveal a distinct and consistent difference in the ID for natural and adversarial data, which forms the basis of our proposed detection method. We validate our approach across two distinct operational scenarios. First, in a batch-wise context for identifying malicious data groups, our method demonstrates high efficacy on datasets like MNIST and SVHN. Second, in the critical individual-sample setting, we establish new state-of-the-art results on challenging benchmarks such as CIFAR-10 and MS COCO. Our detector significantly surpasses existing methods against a wide array of attacks, including CW and AutoAttack, achieving detection rates consistently above 92% on CIFAR-10. The results underscore the robustness of our geometric approach, highlighting that intrinsic dimensionality is a powerful fingerprint for adversarial detection across diverse datasets and attack strategies.
Problem

Research questions and friction points this paper is trying to address.

Detect adversarial attacks on deep neural networks
Analyze gradient intrinsic dimensionality for detection
Validate detection in batch and individual sample scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Detects adversarial attacks via gradient intrinsic dimensionality
Analyzes geometric properties of input loss landscape
Validated on batch and individual sample scenarios
M
Mohammad Mahdi Razmjoo
Sharif University of Technology
M
Mohammad Mahdi Sharifian
Sharif University of Technology
Saeed Bagheri Shouraki
Saeed Bagheri Shouraki
Professor of Electrical Engineering, Sharif University
FuzzyANNControlRoboticsAI