🤖 AI Summary
This work addresses the challenge in zero-shot anomaly detection where heterogeneous anomaly patterns—such as structural discontinuities and subtle diffuse changes—render single adaptation strategies ineffective. To this end, the authors propose EntroAD, a novel framework that introduces, for the first time, a structure entropy–based anomaly-aware token routing mechanism. Specifically, self-attention is employed to estimate patch-wise relationships and compute structural entropy, which dynamically routes distinct anomaly types through dedicated prompt adaptation pathways. This is further enhanced by a confidence-aware dual-branch prompt fine-tuning strategy that optimizes vision–language alignment. Evaluated across ten industrial and medical benchmark datasets, EntroAD consistently outperforms existing methods, achieving state-of-the-art performance in cross-dataset zero-shot anomaly detection.
📝 Abstract
Zero-Shot Anomaly Detection (ZSAD) aims to detect anomalies in unseen domains without target-domain adaptation. Recent CLIP-based methods have shown promising performance by leveraging prompt learning and visual-text alignment. However, most existing approaches rely on a single adaptation pathway, which may be insufficient for heterogeneous anomaly patterns across domains. In practice, anomalies exhibit vastly different characteristics, ranging from salient, localized structural disruptions to subtle, diffuse, and irregular variations. To address this challenge, we propose EntroAD, a structural entropy-guided zero-shot anomaly detection framework. Unlike previous methods, EntroAD introduces a dynamic routing mechanism to process different types of anomalies with specialized adaptation strategies. Specifically, we estimate patch-level structural entropy from self-attention-induced patch relations and use it as a proxy for relational uncertainty to guide anomaly-aware token routing. Based on this routing signal, we construct anomaly-aware routed tokens to better capture anomaly cues with different structural characteristics. We further introduce a confidence-aware dual-branch prompt adaptation module to stabilize visual-text alignment while preserving CLIP's transferable prior. Extensive experiments on 10 industrial and medical benchmarks show that EntroAD achieves state-of-the-art performance in challenging cross-dataset ZSAD settings.