🤖 AI Summary
To address the severe performance degradation caused by missing critical modalities in multimodal learning, this paper proposes a learnable cross-modal knowledge distillation framework. Unlike prior approaches, our method explicitly models dynamic, learnable teacher–student modality relationships, enabling adaptive knowledge transfer under modality absence—without requiring any real missing-modality data during training. By integrating contrastive distillation, modality uncertainty modeling, and a differentiable gating mechanism, the framework end-to-end guides available modalities to distill complementary representations toward the absent ones. Evaluated on standard benchmarks—including MM-IMDb and UR-FUNNY—our approach achieves classification accuracy improvements of 5.2%–9.8% over state-of-the-art methods. It demonstrates superior generalizability and robustness across diverse missing-modality scenarios, offering a principled solution to modality-robust multimodal representation learning.