Continual-NExT: A Unified Comprehension And Generation Continual Learning Framework

📅 2026-02-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses critical challenges in continual learning for Dual-to-Dual multimodal large language models, including catastrophic forgetting, hallucination, instruction non-compliance, and ineffective cross-modal knowledge transfer. To tackle these issues, the authors propose Continual-NExT, the first unified continual learning framework for such models, featuring a novel Mixture and Aggregation of General and Expert LoRA (MAGE) mechanism. By dynamically integrating general-purpose and task-specific LoRA modules, MAGE effectively enhances cross-modal knowledge transfer while mitigating forgetting. The study also introduces the first continual learning evaluation benchmark tailored for integrated multimodal understanding and generation models. Experimental results demonstrate that MAGE significantly outperforms existing methods across multiple metrics, substantially improving both cross-task and cross-modal adaptability.

Technology Category

Application Category

📝 Abstract
Dual-to-Dual MLLMs refer to Multimodal Large Language Models, which can enable unified multimodal comprehension and generation through text and image modalities. Although exhibiting strong instantaneous learning and generalization capabilities, Dual-to-Dual MLLMs still remain deficient in lifelong evolution, significantly affecting continual adaptation to dynamic real-world scenarios. One of the challenges is that learning new tasks inevitably destroys the learned knowledge. Beyond traditional catastrophic forgetting, Dual-to-Dual MLLMs face other challenges, including hallucination, instruction unfollowing, and failures in cross-modal knowledge transfer. However, no standardized continual learning framework for Dual-to-Dual MLLMs has been established yet, leaving these challenges unexplored. Thus, in this paper, we establish Continual-NExT, a continual learning framework for Dual-to-Dual MLLMs with deliberately-architected evaluation metrics. To improve the continual learning capability of Dual-to-Dual MLLMs, we propose an efficient MAGE (Mixture and Aggregation of General LoRA and Expert LoRA) method to further facilitate knowledge transfer across modalities and mitigate forgetting. Extensive experiments demonstrate that MAGE outperforms other continual learning methods and achieves state-of-the-art performance.
Problem

Research questions and friction points this paper is trying to address.

Continual Learning
Multimodal Large Language Models
Catastrophic Forgetting
Cross-modal Knowledge Transfer
Hallucination
Innovation

Methods, ideas, or system contributions that make the work stand out.

Continual Learning
Multimodal Large Language Models
LoRA
Knowledge Transfer
Catastrophic Forgetting
🔎 Similar Papers
No similar papers found.