Resilient Multimodal Industrial Surface Defect Detection with Uncertain Sensors Availability

📅 2025-09-02

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

To address modality missingness in industrial surface defect detection caused by sensor instability, this paper proposes a robust multimodal fusion framework. Methodologically, it introduces three novel prompting mechanisms—cross-modal consistency prompting, modality-specific prompting, and missingness-aware prompting—and employs symmetric contrastive learning with text as a bridging modality to enable complementary RGB and 3D visual feature modeling. Furthermore, it integrates trimodal contrastive pretraining with adversarial text prompt generation to enhance generalization under modality missingness. Experiments demonstrate that, under a combined RGB+3D missingness rate of 0.7, the framework achieves I-AUROC and P-AUROC scores of 73.83% and 93.05%, respectively—surpassing state-of-the-art methods by 3.84% and 5.58%. It consistently outperforms existing approaches across diverse missingness patterns, establishing new benchmarks for robust multimodal industrial defect detection.

Technology Category

Application Category

📝 Abstract

Multimodal industrial surface defect detection (MISDD) aims to identify and locate defect in industrial products by fusing RGB and 3D modalities. This article focuses on modality-missing problems caused by uncertain sensors availability in MISDD. In this context, the fusion of multiple modalities encounters several troubles, including learning mode transformation and information vacancy. To this end, we first propose cross-modal prompt learning, which includes: i) the cross-modal consistency prompt serves the establishment of information consistency of dual visual modalities; ii) the modality-specific prompt is inserted to adapt different input patterns; iii) the missing-aware prompt is attached to compensate for the information vacancy caused by dynamic modalities-missing. In addition, we propose symmetric contrastive learning, which utilizes text modality as a bridge for fusion of dual vision modalities. Specifically, a paired antithetical text prompt is designed to generate binary text semantics, and triple-modal contrastive pre-training is offered to accomplish multimodal learning. Experiment results show that our proposed method achieves 73.83% I-AUROC and 93.05% P-AUROC with a total missing rate 0.7 for RGB and 3D modalities (exceeding state-of-the-art methods 3.84% and 5.58% respectively), and outperforms existing approaches to varying degrees under different missing types and rates. The source code will be available at https://github.com/SvyJ/MISDD-MM.

Problem

Research questions and friction points this paper is trying to address.

Addressing modality-missing issues in industrial defect detection

Fusing RGB and 3D data with uncertain sensor availability

Compensating information loss from dynamic missing modalities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-modal prompt learning for consistency

Symmetric contrastive learning with text bridge

Missing-aware prompt compensates information vacancy

🔎 Similar Papers

RADAR: Robust Two-stage Modality-incomplete Industrial Anomaly Detection