🤖 AI Summary
This work identifies two critical deficiencies of the DINOv2 foundation model in few-shot anomaly detection: insufficient adversarial robustness (e.g., significant performance degradation under FGSM attacks) and miscalibrated anomaly scoring (high-confidence false positives). To address this, we conduct the first systematic evaluation of its adversarial security and uncertainty reliability. We propose a white-box adversarial attack framework—leveraging frozen DINOv2 features coupled with a lightweight linear head—and a post-hoc Platt scaling calibration mechanism. Experiments on MVTec-AD and VisA demonstrate that calibration enlarges predictive entropy separation between normal and anomalous samples, improves detectability of adversarial perturbations, and substantially reduces Expected Calibration Error (ECE). Our core contribution is establishing a unified evaluation paradigm for jointly assessing robustness and calibration in foundation vision models for anomaly detection, alongside providing a lightweight, plug-and-play solution for robustification and score calibration.
📝 Abstract
Foundation models such as DINOv2 have shown strong performance in few-shot anomaly detection, yet two key questions remain unexamined: (i) how susceptible are these detectors to adversarial perturbations; and (ii) how well do their anomaly scores reflect calibrated uncertainty? Building on AnomalyDINO, a training-free deep nearest-neighbor detector over DINOv2 features, we present one of the first systematic studies of adversarial attacks and uncertainty estimation in this setting. To enable white-box gradient attacks while preserving test-time behavior, we attach a lightweight linear head to frozen DINOv2 features only for crafting perturbations. Using this heuristic, we evaluate the impact of FGSM across the MVTec-AD and VisA datasets and observe consistent drops in F1, AUROC, AP, and G-mean, indicating that imperceptible perturbations can flip nearest-neighbor relations in feature space to induce confident misclassification. Complementing robustness, we probe reliability and find that raw anomaly scores are poorly calibrated, revealing a gap between confidence and correctness that limits safety-critical use. As a simple, strong baseline toward trustworthiness, we apply post-hoc Platt scaling to the anomaly scores for uncertainty estimation. The resulting calibrated posteriors yield significantly higher predictive entropy on adversarially perturbed inputs than on clean ones, enabling a practical flagging mechanism for attack detection while reducing calibration error (ECE). Our findings surface concrete vulnerabilities in DINOv2-based few-shot anomaly detectors and establish an evaluation protocol and baseline for robust, uncertainty-aware anomaly detection. We argue that adversarial robustness and principled uncertainty quantification are not optional add-ons but essential capabilities if anomaly detection systems are to be trustworthy and ready for real-world deployment.