PAL: Prompting Analytic Learning with Missing Modality for Multi-Modal Class-Incremental Learning

📅 2025-01-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address performance degradation and catastrophic forgetting induced by modality missing in multimodal class-incremental learning (MMCIL), this paper proposes Prompt-based Adaptive Learning (PAL), a rehearsal-free framework. PAL compensates for missing modalities via modality-specific prompts and formulates incremental learning as an analytical recursive least-squares optimization problem—achieving, for the first time without exemplars, joint holographic representation preservation and forgetting mitigation. The method integrates modality-aware prompt learning, analytical linear optimization, multimodal feature alignment, and cross-modal reconstruction. Evaluated on benchmarks including UPMC-Food101 and N24News under modality-missing settings, PAL improves incremental accuracy by over 12% and reduces forgetting rate by 37%, significantly outperforming state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract
Multi-modal class-incremental learning (MMCIL) seeks to leverage multi-modal data, such as audio-visual and image-text pairs, thereby enabling models to learn continuously across a sequence of tasks while mitigating forgetting. While existing studies primarily focus on the integration and utilization of multi-modal information for MMCIL, a critical challenge remains: the issue of missing modalities during incremental learning phases. This oversight can exacerbate severe forgetting and significantly impair model performance. To bridge this gap, we propose PAL, a novel exemplar-free framework tailored to MMCIL under missing-modality scenarios. Concretely, we devise modality-specific prompts to compensate for missing information, facilitating the model to maintain a holistic representation of the data. On this foundation, we reformulate the MMCIL problem into a Recursive Least-Squares task, delivering an analytical linear solution. Building upon these, PAL not only alleviates the inherent under-fitting limitation in analytic learning but also preserves the holistic representation of missing-modality data, achieving superior performance with less forgetting across various multi-modal incremental scenarios. Extensive experiments demonstrate that PAL significantly outperforms competitive methods across various datasets, including UPMC-Food101 and N24News, showcasing its robustness towards modality absence and its anti-forgetting ability to maintain high incremental accuracy.
Problem

Research questions and friction points this paper is trying to address.

Multi-modal Category Incremental Learning
Performance Degradation
Memory Forgetting
Innovation

Methods, ideas, or system contributions that make the work stand out.

PAL
Multi-modal Category Incremental Learning
Missing Modality Handling
🔎 Similar Papers
No similar papers found.
Xianghu Yue
Xianghu Yue
Tianjin University
speech processingself-supervised learningmulti-modal learning
Y
Yiming Chen
Department of Electrical and Computer Engineering, National University of Singapore, Singapore 117583
X
Xueyi Zhang
Laboratory for Big Data and Decision, National University of Defense Technology, Changsha, 410073, China
Xiaoxue Gao
Xiaoxue Gao
Research Scientist, I2R, A*STAR; National University of Singapore; IEEE Senior Member
Generative AISpeechLarge language models
M
Mengling Feng
Saw Swee Hock School of Public Health, National University of Singapore, Singapore 117583
M
Mingrui Lao
Laboratory for Big Data and Decision, National University of Defense Technology, Changsha, 410073, China
Huiping Zhuang
Huiping Zhuang
Associate Professor, South China University of Technology
Continual LearningMulti-ModalEmbodied AILarge Model
Haizhou Li
Haizhou Li
The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), China; NUS, Singapore
Automatic Speech RecognitionSpeaker RecognitionLanguage RecognitionVoice ConversionMachine Translation