Referring Industrial Anomaly Segmentation

📅 2026-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a novel language-guided open-set paradigm for industrial anomaly segmentation, addressing key limitations of traditional methods—such as coarse localization, reliance on handcrafted thresholds, overfitting due to data scarcity, and the closed-world assumption of “one model per anomaly type.” By leveraging textual descriptions to generate fine-grained anomaly masks, the approach enables a single model to detect diverse anomalies. The study introduces MVTec-Ref, the first dataset supporting referring expressions and small-scale anomalies, and presents DQFormer, a transformer architecture employing only two query tokens (“anomaly” and “background”) coupled with Language-Gated Multi-level Aggregation (LMA) for efficient vision-language fusion. Experiments demonstrate significant improvements in localization accuracy, advancing industrial anomaly detection toward open-set, fine-grained, and generalizable solutions.

Technology Category

Application Category

📝 Abstract
Industrial Anomaly Detection (IAD) is vital for manufacturing, yet traditional methods face significant challenges: unsupervised approaches yield rough localizations requiring manual thresholds, while supervised methods overfit due to scarce, imbalanced data. Both suffer from the"One Anomaly Class, One Model"limitation. To address this, we propose Referring Industrial Anomaly Segmentation (RIAS), a paradigm leveraging language to guide detection. RIAS generates precise masks from text descriptions without manual thresholds and uses universal prompts to detect diverse anomalies with a single model. We introduce the MVTec-Ref dataset to support this, designed with diverse referring expressions and focusing on anomaly patterns, notably with 95% small anomalies. We also propose the Dual Query Token with Mask Group Transformer (DQFormer) benchmark, enhanced by Language-Gated Multi-Level Aggregation (LMA) to improve multi-scale segmentation. Unlike traditional methods using redundant queries, DQFormer employs only"Anomaly"and"Background"tokens for efficient visual-textual integration. Experiments demonstrate RIAS's effectiveness in advancing IAD toward open-set capabilities. Code: https://github.com/swagger-coder/RIAS-MVTec-Ref.
Problem

Research questions and friction points this paper is trying to address.

Industrial Anomaly Detection
Anomaly Segmentation
One Anomaly Class One Model
Open-set Detection
Small Anomalies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Referring Industrial Anomaly Segmentation
Language-Guided Detection
Universal Prompting
Dual Query Token
Open-Set Anomaly Detection
🔎 Similar Papers
No similar papers found.
P
Pengfei Yue
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China.
X
Xiaokang Jiang
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China.
Y
Yilin Lu
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China.
Jianghang Lin
Jianghang Lin
Xiamen University
Multimodal Large Language ModelVision-Language ModelSemi/Weakly-Supervised Learning
Shengchuan Zhang
Shengchuan Zhang
Xiamen University
computer visionmachine learning
L
Liujuan Cao
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China.