🤖 AI Summary
Existing unsupervised multimodal anomaly detection methods neglect the physical causal logic from welding process to outcome, treating process modalities (e.g., video, audio, sensor signals) and outcome modalities (e.g., post-weld images) equivalently, while struggling to fuse heterogeneous high-dimensional visual and low-dimensional sensor data. Method: This paper proposes a Sensor-Guided Causal Hierarchical Modeling (CHM) framework tailored for robotic welding, which modulates audio-visual features via sensor signals to establish a unidirectional generative mapping (“process → outcome”) and enforces multimodal physical consistency constraints. Contribution/Results: CHM explicitly encodes physical causality, overcoming both causal blindness and modality heterogeneity bottlenecks. Evaluated on Weld-4M—a newly constructed four-modal welding benchmark—the method achieves an image-level AUROC (I-AUROC) of 90.7%, significantly surpassing state-of-the-art approaches and attaining internationally leading performance.
📝 Abstract
Multimodal Unsupervised Anomaly Detection (UAD) is critical for quality assurance in smart manufacturing, particularly in complex processes like robotic welding. However, existing methods often suffer from causal blindness, treating process modalities (e.g., real-time video, audio, and sensors) and result modalities (e.g., post-weld images) as equal feature sources, thereby ignoring the inherent physical generative logic. Furthermore, the heterogeneity gap between high-dimensional visual data and low-dimensional sensor signals frequently leads to critical process context being drowned out. In this paper, we propose Causal-HM, a unified multimodal UAD framework that explicitly models the physical Process to Result dependency. Specifically, our framework incorporates two key innovations: a Sensor-Guided CHM Modulation mechanism that utilizes low-dimensional sensor signals as context to guide high-dimensional audio-visual feature extraction , and a Causal-Hierarchical Architecture that enforces a unidirectional generative mapping to identify anomalies that violate physical consistency. Extensive experiments on our newly constructed Weld-4M benchmark across four modalities demonstrate that Causal-HM achieves a state-of-the-art (SOTA) I-AUROC of 90.7%. Code will be released after the paper is accepted.