🤖 AI Summary
Existing industrial anomaly detection (IAD) datasets—e.g., MVTec 3D—are limited in scale, resolution, and modality diversity, hindering realistic multimodal IAD research. To address this, we introduce the first high-fidelity, real-world-oriented multimodal IAD benchmark supporting 2D RGB, pseudo-3D, and true 3D modalities: it covers 20 industrial part categories and provides micrometer-precision 3D point clouds, 4K RGB images, and photometric stereo–generated high-fidelity pseudo-3D depth maps—the first such integration of photometric stereo for pseudo-3D IAD. We further propose an RGB–point cloud–pseudo-3D collaborative fusion network incorporating cross-modal attention and feature alignment mechanisms. Experiments demonstrate substantial improvements in robustness and accuracy for fine-grained defect detection. The complete dataset—including annotations—and source code are publicly released, establishing a new de facto standard benchmark for industrial vision.
📝 Abstract
The increasing complexity of industrial anomaly detection (IAD) has positioned multimodal detection methods as a focal area of machine vision research. However, dedicated multimodal datasets specifically tailored for IAD remain limited. Pioneering datasets like MVTec 3D have laid essential groundwork in multimodal IAD by incorporating RGB+3D data, but still face challenges in bridging the gap with real industrial environments due to limitations in scale and resolution. To address these challenges, we introduce Real-IAD D3, a high-precision multimodal dataset that uniquely incorporates an additional pseudo3D modality generated through photometric stereo, alongside high-resolution RGB images and micrometer-level 3D point clouds. Real-IAD D3 features finer defects, diverse anomalies, and greater scale across 20 categories, providing a challenging benchmark for multimodal IAD Additionally, we introduce an effective approach that integrates RGB, point cloud, and pseudo-3D depth information to leverage the complementary strengths of each modality, enhancing detection performance. Our experiments highlight the importance of these modalities in boosting detection robustness and overall IAD performance. The dataset and code are publicly accessible for research purposes at https://realiad4ad.github.io/Real-IAD D3