🤖 AI Summary
To address insufficient modeling of normal samples and low localization accuracy in industrial unsupervised visual anomaly detection, this paper proposes a patch-aware dynamic vector quantization (VQ) method. The core contribution lies in: (1) a context-sensitive dynamic codebook allocation mechanism that adaptively assigns codewords based on local patch semantics to mitigate mode collapse; and (2) an enhanced VQ-VAE framework integrating local structural modeling, online codebook updating, and reconstruction constraint optimization to construct a compact, discriminative feature representation space. Evaluated on MVTec-AD, BTAD, and MTSD benchmarks, the method achieves state-of-the-art performance in both image-level and pixel-level anomaly detection, with significant improvements in anomaly localization accuracy and cross-dataset generalization.
📝 Abstract
Unsupervised visual defect detection is critical in industrial applications, requiring a representation space that captures normal data features while detecting deviations. Achieving a balance between expressiveness and compactness is challenging; an overly expressive space risks inefficiency and mode collapse, impairing detection accuracy. We propose a novel approach using an enhanced VQ-VAE framework optimized for unsupervised defect detection. Our model introduces a patch-aware dynamic code assignment scheme, enabling context-sensitive code allocation to optimize spatial representation. This strategy enhances normal-defect distinction and improves detection accuracy during inference. Experiments on MVTecAD, BTAD, and MTSD datasets show our method achieves state-of-the-art performance.