🤖 AI Summary
This work addresses the challenges of scarce supervision and diverse defect patterns in image anomaly detection under extremely limited normal samples. To this end, we propose a bias-guided prompt learning framework that leverages the semantic priors of vision-language models. Our approach employs learnable shared context prompts combined with anomaly-aware suffixes to discriminate between normal and anomalous semantics. Furthermore, we introduce a Top-K multiple instance loss function grounded in Gaussian distribution modeling to quantify the statistical deviation of image patches from the normal distribution, thereby enhancing anomaly localization accuracy. Extensive experiments on the MVTec AD and VisA benchmarks demonstrate that our method significantly outperforms existing approaches such as PromptAD. Ablation studies further validate the effectiveness and interpretability of each component in our framework.
📝 Abstract
Few-normal shot anomaly detection (FNSAD) aims to detect abnormal regions in images using only a few normal training samples, making the task highly challenging due to limited supervision and the diversity of potential defects. Recent approaches leverage vision-language models such as CLIP with prompt-based learning to align image and text features. However, existing methods often exhibit weak discriminability between normal and abnormal prompts and lack principled scoring mechanisms for patch-level anomalies. We propose a deviation-guided prompt learning framework that integrates the semantic power of vision-language models with the statistical reliability of deviation-based scoring. Specifically, we replace fixed prompt prefixes with learnable context vectors shared across normal and abnormal prompts, while anomaly-specific suffix tokens enable class-aware alignment. To enhance separability, we introduce a deviation loss with Top-K Multiple Instance Learning (MIL), modeling patch-level features as Gaussian deviations from the normal distribution. This allows the network to assign higher anomaly scores to patches with statistically significant deviations, improving localization and interpretability. Experiments on the MVTecAD and VISA benchmarks demonstrate superior pixel-level detection performance compared to PromptAD and other baselines. Ablation studies further validate the effectiveness of learnable prompts, deviation-based scoring, and the Top-K MIL strategy.