🤖 AI Summary
This work addresses the limitations of existing unsupervised continual anomaly detection methods, which rely solely on a single visual modality and struggle to accurately model the normal pattern manifold in complex scenes, thereby constraining detection performance. To overcome this, we propose the first multimodal prompt-based unsupervised continual anomaly detection framework. Our approach introduces a Continually updated Multimodal Prompt Memory Bank (CMPMB), complemented by a defect semantics-guided Adaptive Normalization Module (ANM) and a Dynamic Fusion Strategy (DFS), enabling effective collaboration among multimodal cues. By transcending the constraints of unimodal modeling, the proposed method achieves state-of-the-art performance in both image-level AUROC and pixel-level AUPR on the MVTec AD and VisA benchmarks, while simultaneously enhancing adversarial robustness.
📝 Abstract
Unsupervised Continuous Anomaly Detection (UCAD) is gaining attention for effectively addressing the catastrophic forgetting and heavy computational burden issues in traditional Unsupervised Anomaly Detection (UAD). However, existing UCAD approaches that rely solely on visual information are insufficient to capture the manifold of normality in complex scenes, thereby impeding further gains in anomaly detection accuracy. To overcome this limitation, we propose an unsupervised continual anomaly detection framework grounded in multimodal prompting. Specifically, we introduce a Continual Multimodal Prompt Memory Bank (CMPMB) that progressively distills and retains prototypical normal patterns from both visual and textual domains across consecutive tasks, yielding a richer representation of normality. Furthermore, we devise a Defect-Semantic-Guided Adaptive Fusion Mechanism (DSG-AFM) that integrates an Adaptive Normalization Module (ANM) with a Dynamic Fusion Strategy (DFS) to jointly enhance detection accuracy and adversarial robustness. Benchmark experiments on MVTec AD and VisA datasets show that our approach achieves state-of-the-art (SOTA) performance on image-level AUROC and pixel-level AUPR metrics.