π€ AI Summary
To address data scarcity, high annotation costs, and dynamic environmental shifts in industrial and medical anomaly detection, this paper proposes CoZAD, a zero-shot anomaly detection framework. Methodologically, CoZAD integrates confidence-aware meta-learning with contrastive feature representation: it employs IQR-based uncertainty quantification and covariance regularization to weight learning of prototypical normal patterns while preserving boundary samples; contrastive learning constructs a discriminative feature space, enabling rapid domain adaptation without vision-language alignment or model ensembling. Innovatively combining soft-confidence learning, MAML, and feature clustering, CoZAD significantly enhances generalization to unseen anomalies. Evaluated on ten industrial and medical datasets, it achieves state-of-the-art performance: surpassing prior methods on 6 out of 7 industrial benchmarks, and attaining image-level AUROC scores of 99.2% on DTD-Synthetic and 97.2% on BTAD, alongside a pixel-level AUROC of 96.3% on MVTec-AD.
π Abstract
Industrial and medical anomaly detection faces critical challenges from data scarcity and prohibitive annotation costs, particularly in evolving manufacturing and healthcare settings. To address this, we propose CoZAD, a novel zero-shot anomaly detection framework that integrates soft confident learning with meta-learning and contrastive feature representation. Unlike traditional confident learning that discards uncertain samples, our method assigns confidence-based weights to all training data, preserving boundary information while emphasizing prototypical normal patterns. The framework quantifies data uncertainty through IQR-based thresholding and model uncertainty via covariance based regularization within a Model-Agnostic Meta-Learning. Contrastive learning creates discriminative feature spaces where normal patterns form compact clusters, enabling rapid domain adaptation. Comprehensive evaluation across 10 datasets spanning industrial and medical domains demonstrates state-of-the-art performance, outperforming existing methods on 6 out of 7 industrial benchmarks with notable improvements on texture-rich datasets (99.2% I-AUROC on DTD-Synthetic, 97.2% on BTAD) and pixellevel localization (96.3% P-AUROC on MVTec-AD). The framework eliminates dependence on vision-language alignments or model ensembles, making it valuable for resourceconstrained environments requiring rapid deployment.