WMoE-CLIP: Wavelet-Enhanced Mixture-of-Experts Prompt Learning for Zero-Shot Anomaly Detection

📅 2026-03-06

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work proposes a novel framework for zero-shot anomaly detection that overcomes the limitations of fixed text prompts and single spatial-domain features, which struggle to capture complex semantics and subtle anomalies. The approach integrates multi-frequency visual analysis with semantic adaptability by employing a variational autoencoder to model global semantics and dynamically refine CLIP text embeddings. Concurrently, wavelet decomposition is leveraged to extract multi-scale frequency-domain features. A semantic-aware mixture-of-experts module is further introduced to enable fine-grained cross-modal alignment. Notably, this is the first method to combine wavelet-based multi-frequency analysis with mixture-of-experts prompt learning. Extensive experiments on 14 industrial and medical datasets demonstrate significant performance gains over existing approaches, highlighting its superior generalization capability.

Technology Category

Application Category

📝 Abstract

Vision-language models have recently shown strong generalization in zero-shot anomaly detection (ZSAD), enabling the detection of unseen anomalies without task-specific supervision. However, existing approaches typically rely on fixed textual prompts, which struggle to capture complex semantics, and focus solely on spatial-domain features, limiting their ability to detect subtle anomalies. To address these challenges, we propose a wavelet-enhanced mixture-of-experts prompt learning method for ZSAD. Specifically, a variational autoencoder is employed to model global semantic representations and integrate them into prompts to enhance adaptability to diverse anomaly patterns. Wavelet decomposition extracts multi-frequency image features that dynamically refine textual embeddings through cross-modal interactions. Furthermore, a semantic-aware mixture-of-experts module is introduced to aggregate contextual information. Extensive experiments on 14 industrial and medical datasets demonstrate the effectiveness of the proposed method.

Problem

Research questions and friction points this paper is trying to address.

zero-shot anomaly detection

textual prompts

spatial-domain features

subtle anomalies

vision-language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Wavelet Decomposition

Mixture-of-Experts

Prompt Learning