Continual-NExT: A Unified Comprehension And Generation Continual Learning Framework

📅 2026-02-20

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This work addresses critical challenges in continual learning for Dual-to-Dual multimodal large language models, including catastrophic forgetting, hallucination, instruction non-compliance, and ineffective cross-modal knowledge transfer. To tackle these issues, the authors propose Continual-NExT, the first unified continual learning framework for such models, featuring a novel Mixture and Aggregation of General and Expert LoRA (MAGE) mechanism. By dynamically integrating general-purpose and task-specific LoRA modules, MAGE effectively enhances cross-modal knowledge transfer while mitigating forgetting. The study also introduces the first continual learning evaluation benchmark tailored for integrated multimodal understanding and generation models. Experimental results demonstrate that MAGE significantly outperforms existing methods across multiple metrics, substantially improving both cross-task and cross-modal adaptability.

Technology Category

Application Category

📝 Abstract

Dual-to-Dual MLLMs refer to Multimodal Large Language Models, which can enable unified multimodal comprehension and generation through text and image modalities. Although exhibiting strong instantaneous learning and generalization capabilities, Dual-to-Dual MLLMs still remain deficient in lifelong evolution, significantly affecting continual adaptation to dynamic real-world scenarios. One of the challenges is that learning new tasks inevitably destroys the learned knowledge. Beyond traditional catastrophic forgetting, Dual-to-Dual MLLMs face other challenges, including hallucination, instruction unfollowing, and failures in cross-modal knowledge transfer. However, no standardized continual learning framework for Dual-to-Dual MLLMs has been established yet, leaving these challenges unexplored. Thus, in this paper, we establish Continual-NExT, a continual learning framework for Dual-to-Dual MLLMs with deliberately-architected evaluation metrics. To improve the continual learning capability of Dual-to-Dual MLLMs, we propose an efficient MAGE (Mixture and Aggregation of General LoRA and Expert LoRA) method to further facilitate knowledge transfer across modalities and mitigate forgetting. Extensive experiments demonstrate that MAGE outperforms other continual learning methods and achieves state-of-the-art performance.

Problem

Research questions and friction points this paper is trying to address.

Continual Learning

Multimodal Large Language Models

Catastrophic Forgetting

Cross-modal Knowledge Transfer

Hallucination

Innovation

Methods, ideas, or system contributions that make the work stand out.

Continual Learning

Multimodal Large Language Models

LoRA