🤖 AI Summary
To address performance degradation and catastrophic forgetting induced by modality missing in multimodal class-incremental learning (MMCIL), this paper proposes Prompt-based Adaptive Learning (PAL), a rehearsal-free framework. PAL compensates for missing modalities via modality-specific prompts and formulates incremental learning as an analytical recursive least-squares optimization problem—achieving, for the first time without exemplars, joint holographic representation preservation and forgetting mitigation. The method integrates modality-aware prompt learning, analytical linear optimization, multimodal feature alignment, and cross-modal reconstruction. Evaluated on benchmarks including UPMC-Food101 and N24News under modality-missing settings, PAL improves incremental accuracy by over 12% and reduces forgetting rate by 37%, significantly outperforming state-of-the-art approaches.
📝 Abstract
Multi-modal class-incremental learning (MMCIL) seeks to leverage multi-modal data, such as audio-visual and image-text pairs, thereby enabling models to learn continuously across a sequence of tasks while mitigating forgetting. While existing studies primarily focus on the integration and utilization of multi-modal information for MMCIL, a critical challenge remains: the issue of missing modalities during incremental learning phases. This oversight can exacerbate severe forgetting and significantly impair model performance. To bridge this gap, we propose PAL, a novel exemplar-free framework tailored to MMCIL under missing-modality scenarios. Concretely, we devise modality-specific prompts to compensate for missing information, facilitating the model to maintain a holistic representation of the data. On this foundation, we reformulate the MMCIL problem into a Recursive Least-Squares task, delivering an analytical linear solution. Building upon these, PAL not only alleviates the inherent under-fitting limitation in analytic learning but also preserves the holistic representation of missing-modality data, achieving superior performance with less forgetting across various multi-modal incremental scenarios. Extensive experiments demonstrate that PAL significantly outperforms competitive methods across various datasets, including UPMC-Food101 and N24News, showcasing its robustness towards modality absence and its anti-forgetting ability to maintain high incremental accuracy.