🤖 AI Summary
To address dynamic modality missing—caused by device failures, privacy constraints, and other real-world factors—this paper pioneers modeling missing-modality adaptation as a continual learning problem. We propose a prompt-based continual multimodal framework for missing modalities, innovatively designing three types of prompts: modality-specific, task-aware, and task-specific. Additionally, we introduce a contrastive task interaction strategy to explicitly model cross-modal prompt correlations, thereby effectively mitigating catastrophic forgetting. Crucially, our approach eliminates the need for modality reconstruction or feature alignment modules, significantly reducing computational overhead. Evaluated on three public multimodal benchmarks, our method consistently outperforms state-of-the-art approaches across downstream task performance under missing modalities, inference efficiency, and forgetting suppression—achieving substantial gains in all three dimensions.
📝 Abstract
Missing modality issues are common in real-world applications, arising from factors such as equipment failures and privacy concerns. When fine-tuning pre-trained models on downstream datasets with missing modalities, performance can degrade significantly. Current methods often aggregate various missing cases to train recovery modules or align multimodal features, resulting in suboptimal performance, high computational costs, and the risk of catastrophic forgetting in continual environments where data arrives sequentially. In this paper, we formulate the dynamic missing modality problem as a continual learning task and introduce the continual multimodal missing modality task. To address this challenge efficiently, we introduce three types of prompts: modality-specific, task-aware, and task-specific prompts. These prompts enable the model to learn intra-modality, inter-modality, intra-task, and inter-task features. Furthermore, we propose a contrastive task interaction strategy to explicitly learn prompts correlating different modalities. We conduct extensive experiments on three public datasets, where our method consistently outperforms state-of-the-art approaches.