ModalPrompt:Dual-Modality Guided Prompt for Continual Learning of Large Multimodal Models

📅 2024-10-08

🏛️ arXiv.org

📈 Citations: 15

✨ Influential: 1

career value

202K/year

🤖 AI Summary

To address catastrophic forgetting in large multimodal models (LMMs) during continual learning of new tasks in dynamic scenarios, this paper proposes ModalPrompt—a dual-modal guided prompt learning framework. Unlike existing approaches relying on data replay or model expansion, ModalPrompt introduces a novel image-text joint supervised task-prototype prompting mechanism, integrated with cross-modal prompt selection and lightweight fusion. This enables efficient and scalable multimodal continual instruction tuning (MCIT). Crucially, ModalPrompt introduces only a minimal number of learnable parameters, ensuring training cost remains constant regardless of task count. Evaluated on standard LMM continual learning benchmarks, ModalPrompt achieves an average 20% performance gain and 1.42× faster inference speed, striking a superior balance among accuracy, efficiency, and scalability.

Technology Category

Application Category

📝 Abstract

Large Multimodal Models (LMMs) exhibit remarkable multi-tasking ability by learning mixed datasets jointly. However, novel tasks would be encountered sequentially in dynamic world, and continually fine-tuning LMMs often leads to performance degrades. To handle the challenges of catastrophic forgetting, existing methods leverage data replay or model expansion, both of which are not specially developed for LMMs and have their inherent limitations. In this paper, we propose a novel dual-modality guided prompt learning framework (ModalPrompt) tailored for multimodal continual learning to effectively learn new tasks while alleviating forgetting of previous knowledge. Concretely, we learn prototype prompts for each task and exploit efficient prompt selection for task identifiers and prompt fusion for knowledge transfer based on image-text supervision. Extensive experiments demonstrate the superiority of our approach, e.g., ModalPrompt achieves +20% performance gain on LMMs continual learning benchmarks with $ imes$ 1.42 inference speed refraining from growing training cost in proportion to the number of tasks. The code will be made publically available.

Problem

Research questions and friction points this paper is trying to address.

Addresses multimodal continual instruction learning for LMMs

Reduces computational complexity while preventing knowledge forgetting

Enhances efficiency and performance in sequential task learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-modality guided prompt learning framework

Task-specific prompts with efficient fusion

Computational complexity management via prompt selection

🔎 Similar Papers

No similar papers found.