🤖 AI Summary
This work proposes a novel language-guided open-set paradigm for industrial anomaly segmentation, addressing key limitations of traditional methods—such as coarse localization, reliance on handcrafted thresholds, overfitting due to data scarcity, and the closed-world assumption of “one model per anomaly type.” By leveraging textual descriptions to generate fine-grained anomaly masks, the approach enables a single model to detect diverse anomalies. The study introduces MVTec-Ref, the first dataset supporting referring expressions and small-scale anomalies, and presents DQFormer, a transformer architecture employing only two query tokens (“anomaly” and “background”) coupled with Language-Gated Multi-level Aggregation (LMA) for efficient vision-language fusion. Experiments demonstrate significant improvements in localization accuracy, advancing industrial anomaly detection toward open-set, fine-grained, and generalizable solutions.
📝 Abstract
Industrial Anomaly Detection (IAD) is vital for manufacturing, yet traditional methods face significant challenges: unsupervised approaches yield rough localizations requiring manual thresholds, while supervised methods overfit due to scarce, imbalanced data. Both suffer from the"One Anomaly Class, One Model"limitation. To address this, we propose Referring Industrial Anomaly Segmentation (RIAS), a paradigm leveraging language to guide detection. RIAS generates precise masks from text descriptions without manual thresholds and uses universal prompts to detect diverse anomalies with a single model. We introduce the MVTec-Ref dataset to support this, designed with diverse referring expressions and focusing on anomaly patterns, notably with 95% small anomalies. We also propose the Dual Query Token with Mask Group Transformer (DQFormer) benchmark, enhanced by Language-Gated Multi-Level Aggregation (LMA) to improve multi-scale segmentation. Unlike traditional methods using redundant queries, DQFormer employs only"Anomaly"and"Background"tokens for efficient visual-textual integration. Experiments demonstrate RIAS's effectiveness in advancing IAD toward open-set capabilities. Code: https://github.com/swagger-coder/RIAS-MVTec-Ref.