🤖 AI Summary
This work addresses the challenge in unsupervised anomaly detection where reconstruction-based methods suffer from poor generalization and limited cross-scenario applicability due to the “same shortcut” problem. To overcome this, the authors propose FSR, a general-purpose framework that introduces a random non-overlapping block shuffling and restoration mechanism on multi-scale semantic features, coupled with an adjustable shuffling rate strategy. This design compels the model to learn global contextual information rather than relying on local shortcuts. Theoretical analysis from both architectural and mutual information perspectives substantiates the effectiveness of the approach. Extensive experiments demonstrate that FSR significantly outperforms existing methods across diverse industrial inspection scenarios, exhibiting superior generalizability, robustness, and computational efficiency.
📝 Abstract
Unsupervised anomaly detection is vital in industrial fields, with reconstruction-based methods favored for their simplicity and effectiveness. However, reconstruction methods often encounter an identical shortcut issue, where both normal and anomalous regions can be well reconstructed and fail to identify outliers. The severity of this problem increases with the complexity of the normal data distribution. Consequently, existing methods may exhibit excellent detection performance in a specific scenario, but their performance sharply declines when transferred to another scenario. This paper focuses on establishing a universal model applicable to anomaly detection tasks across different settings, termed as universal anomaly detection. In this work, we introduce a novel, straightforward yet efficient framework for universal anomaly detection: \uline{F}eature \uline{S}huffling and \uline{R}estoration (FSR), which can alleviate the identical shortcut issue across different settings. First and foremost, FSR employs multi-scale features with rich semantic information as reconstruction targets, rather than raw image pixels. Subsequently, these multi-scale features are partitioned into non-overlapping feature blocks, which are randomly shuffled and then restored to their original state using a restoration network. This simple paradigm encourages the model to focus more on global contextual information. Additionally, we introduce a novel concept, the shuffling rate, to regulate the complexity of the FSR task, thereby alleviating the identical shortcut across different settings. Furthermore, we provide theoretical explanations for the effectiveness of FSR framework from two perspectives: network structure and mutual information. Extensive experimental results validate the superiority and efficiency of the FSR framework across different settings.Code is available at https://github.com/luow23/FSR.