GradID: Adversarial Detection via Intrinsic Dimensionality of Gradients

📅 2025-12-14

📈 Citations: 0

✨ Influential: 0

career value

237K/year

🤖 AI Summary

Deep neural networks are vulnerable to imperceptible adversarial perturbations, posing critical safety risks in high-stakes applications such as medical diagnosis and autonomous driving. To address this, we propose a geometric discrimination method based on the intrinsic dimensionality (ID) of gradient manifolds—revealing, for the first time, stable dimensional disparities between natural and adversarial samples in gradient space. Our approach establishes the first general-purpose adversarial detection paradigm grounded in the geometric structure of gradient manifolds. It operates without model modification or retraining and supports both single-sample and batch-mode deployment. On CIFAR-10 and MS COCO, it sets new state-of-the-art performance for single-sample detection, achieving consistently >92% detection rates against strong attacks including CW and AutoAttack. In batch-mode evaluation on MNIST and SVHN, it attains optimal performance among existing methods.

Technology Category

Application Category

📝 Abstract

Despite their remarkable performance, deep neural networks exhibit a critical vulnerability: small, often imperceptible, adversarial perturbations can lead to drastically altered model predictions. Given the stringent reliability demands of applications such as medical diagnosis and autonomous driving, robust detection of such adversarial attacks is paramount. In this paper, we investigate the geometric properties of a model's input loss landscape. We analyze the Intrinsic Dimensionality (ID) of the model's gradient parameters, which quantifies the minimal number of coordinates required to describe the data points on their underlying manifold. We reveal a distinct and consistent difference in the ID for natural and adversarial data, which forms the basis of our proposed detection method. We validate our approach across two distinct operational scenarios. First, in a batch-wise context for identifying malicious data groups, our method demonstrates high efficacy on datasets like MNIST and SVHN. Second, in the critical individual-sample setting, we establish new state-of-the-art results on challenging benchmarks such as CIFAR-10 and MS COCO. Our detector significantly surpasses existing methods against a wide array of attacks, including CW and AutoAttack, achieving detection rates consistently above 92% on CIFAR-10. The results underscore the robustness of our geometric approach, highlighting that intrinsic dimensionality is a powerful fingerprint for adversarial detection across diverse datasets and attack strategies.

Problem

Research questions and friction points this paper is trying to address.

Detect adversarial attacks on deep neural networks

Analyze gradient intrinsic dimensionality for detection

Validate detection in batch and individual sample scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Detects adversarial attacks via gradient intrinsic dimensionality

Analyzes geometric properties of input loss landscape

Validated on batch and individual sample scenarios

🔎 Similar Papers

Introducing Perturb-ability Score (PS) to Enhance Robustness Against Problem-Space Evasion Adversarial Attacks on Flow-based ML-NIDS