🤖 AI Summary
In multi-class unsupervised anomaly detection (MUAD), Transformer-based models suffer from an *identity shortcut*—where inputs are trivially copied via identity mappings, diminishing the discriminative gap between normal and anomalous reconstruction errors. To address this, we propose a unified feature reconstruction framework: (1) a low-rank noise bottleneck suppresses redundant feature replication, and (2) a global perturbation attention mechanism disrupts information shortcuts in the decoder. We theoretically prove that our design effectively eliminates identity mappings. Evaluated on four benchmarks—MVTec-AD, ViSA, Real-IAD, and Universal Medical—our method achieves state-of-the-art image-level AUROC scores of 99.8%, 98.9%, 90.6%, and 87.8%, respectively. Our core contribution is the first systematic identification and mitigation of identity shortcuts in MUAD Transformers, establishing a new paradigm for cross-category anomaly detection that is interpretable, robust, and generalizable.
📝 Abstract
Multi-class unsupervised anomaly detection (MUAD) has garnered growing research interest, as it seeks to develop a unified model for anomaly detection across multiple classes, i.e., eliminating the need to train separate models for distinct objects and thereby saving substantial computational resources. Under the MUAD setting, while advanced Transformer-based architectures have brought significant performance improvements, identity shortcuts persist: they directly copy inputs to outputs, narrowing the gap in reconstruction errors between normal and abnormal cases, and thereby making the two harder to distinguish. Therefore, we propose ShortcutBreaker, a novel unified feature-reconstruction framework for MUAD tasks, featuring two key innovations to address the issue of shortcuts. First, drawing on matrix rank inequality, we design a low-rank noisy bottleneck (LRNB) to project highdimensional features into a low-rank latent space, and theoretically demonstrate its capacity to prevent trivial identity reproduction. Second, leveraging ViTs global modeling capability instead of merely focusing on local features, we incorporate a global perturbation attention to prevent information shortcuts in the decoders. Extensive experiments are performed on four widely used anomaly detection benchmarks, including three industrial datasets (MVTec-AD, ViSA, and Real-IAD) and one medical dataset (Universal Medical). The proposed method achieves a remarkable image-level AUROC of 99.8%, 98.9%, 90.6%, and 87.8% on these four datasets, respectively, consistently outperforming previous MUAD methods across different scenarios.