🤖 AI Summary
This work addresses the challenges of high computational cost in single-class modeling and poor robustness in joint modeling of heterogeneous defects for industrial multi-class anomaly detection. To this end, the authors propose FPFNet, a unified detection framework that introduces, for the first time, a feature perturbation pooling mechanism. This mechanism enhances the training distribution through random perturbations—including Gaussian noise, F-Noise, and F-Drop—without introducing additional learnable parameters or computational overhead. Combined with multi-level residual connections between the encoder and decoder and a normalized feature fusion strategy, the model achieves improved generalization against domain shifts and unseen defects. FPFNet attains state-of-the-art performance, yielding image- and pixel-level AUROC scores of 97.17%/96.93% on MVTec-AD and 91.08%/99.08% on VisA, significantly outperforming existing methods.
📝 Abstract
Multi-class defect detection constitutes a critical yet challenging task in industrial quality inspection, where existing approaches typically suffer from two fundamental limitations: (i) the necessity of training separate models for each defect category, resulting in substantial computational and memory overhead, and (ii) degraded robustness caused by inter-class feature perturbation when heterogeneous defect categories are jointly modeled. In this paper, we present FPFNet, a Feature Perturbation Pool-based Fusion Network that synergistically integrates a stochastic feature perturbation pool with a multi-layer feature fusion strategy to address these challenges within a unified detection framework. The feature perturbation pool enriches the training distribution by randomly injecting diverse noise patterns -- including Gaussian noise, F-Noise, and F-Drop -- into the extracted feature representations, thereby strengthening the model's robustness against domain shifts and unseen defect morphologies. Concurrently, the multi-layer feature fusion module aggregates hierarchical feature representations from both the encoder and decoder through residual connections and normalization, enabling the network to capture complex cross-scale relationships while preserving fine-grained spatial details essential for precise defect localization. Built upon the UniAD architecture~\cite{you2022unified}, our method achieves state-of-the-art performance on two widely adopted benchmarks: 97.17\% image-level AUROC and 96.93\% pixel-level AUROC on MVTec-AD, and 91.08\% image-level AUROC and 99.08\% pixel-level AUROC on VisA, surpassing existing methods by notable margins while introducing no additional learnable parameters or computational complexity.