🤖 AI Summary
To address performance degradation caused by negative cross-modal learning (NCL) and poor generalization under unimodal degradation (e.g., missing image or text inputs) in multimodal learning, this paper proposes Aggressive Modality Dropping (AMD)—a novel training strategy that, for the first time, empirically demonstrates the reversal of NCL into positive cross-modal learning (PCL). AMD is seamlessly integrated into existing multimodal collaborative learning frameworks and jointly optimized with modality-agnostic feature disentanglement, requiring no additional parameters. Experiments show that AMD improves model accuracy by 20% during the NCL regime; significantly enhances robustness and generalization stability under unimodal input conditions; and enables efficient unimodal deployment. This work establishes a new paradigm for understanding multimodal collaboration mechanisms and facilitating lightweight, practical deployment of multimodal models.
📝 Abstract
This paper aims to document an effective way to improve multimodal co-learning by using aggressive modality dropout. We find that by using aggressive modality dropout we are able to reverse negative co-learning (NCL) to positive co-learning (PCL). Aggressive modality dropout can be used to"prep"a multimodal model for unimodal deployment, and dramatically increases model performance during negative co-learning, where during some experiments we saw a 20% gain in accuracy. We also benchmark our modality dropout technique against PCL to show that our modality drop out technique improves co-learning during PCL, although it does not have as much as an substantial effect as it does during NCL. Github: https://github.com/nmagal/modality_drop_for_colearning