🤖 AI Summary
To address performance degradation in industrial anomaly detection under incomplete multimodal inference—where modalities (e.g., RGB, thermal, acoustic) are partially unavailable—this paper proposes the first cross-modal knowledge distillation framework tailored for incomplete multimodal settings. Methodologically: (1) a modality-agnostic feature alignment mechanism enforces semantic consistency across heterogeneous modalities via contrastive learning; (2) an uncertainty-aware teacher–student architecture employs weighted knowledge distillation and a lightweight multi-branch student network; (3) a modality-missing robust training strategy is introduced, eliminating the need for modality imputation. Evaluated on MVTec-AD and VisA, the framework achieves >92.3% AUROC under 40% modality missing rate—substantially outperforming unimodal and imputation-based baselines. This work is the first to empirically validate the effectiveness and robustness of cross-modal distillation for incomplete multimodal anomaly detection.