🤖 AI Summary
To address the high cost of visual annotation and poor robustness in defect identification under small-sample conditions in laser powder bed fusion (LPBF) in-situ monitoring, this paper proposes an audio-visual cross-modal knowledge transfer framework. Innovatively leveraging unlabeled acoustic signals as weak supervision, it introduces the first bidirectional cross-modal distillation mechanism: semantic knowledge from audio is transferred to vision models via multi-scale time-frequency representation and contrastive cross-modal alignment; concurrently, a teacher-student architecture combined with self-supervised feature disentanglement enables efficient utilization of unlabeled visual data. Evaluated on real-world LPBF production-line data, the method achieves 92.3% accuracy in porosity and crack detection, reduces annotation requirements by 76%, and lowers false positive rate by 41%. These results significantly enhance small-sample anomaly detection performance and practical deployability in industrial additive manufacturing settings.