🤖 AI Summary
Multimodal large language models (MLLMs) face catastrophic forgetting and paradigm fragmentation when continually acquiring new knowledge and capabilities in dynamic real-world scenarios. Method: We introduce the first MLLM continual learning benchmark jointly supporting domain evolution and capability emergence. Our approach unifies IID domain continual learning and non-IID capability continual learning—two previously disjoint paradigms—via a parameter isolation mechanism and an MLLM-driven dynamic routing strategy. It further incorporates multi-stage incremental training and cross-modal knowledge consolidation to mitigate forgetting. Results: Experiments demonstrate an average accuracy improvement of 19.4% on both domain and capability continual learning tasks, alongside a 32.7% gain in knowledge integration efficiency. This work establishes a novel, reproducible paradigm and benchmark for MLLM continual learning.
📝 Abstract
Recent Multimodal Large Language Models (MLLMs) excel in vision-language understanding but face challenges in adapting to dynamic real-world scenarios that require continuous integration of new knowledge and skills. While continual learning (CL) offers a potential solution, existing benchmarks and methods suffer from critical limitations. In this paper, we introduce MLLM-CL, a novel benchmark encompassing domain and ability continual learning, where the former focuses on independently and identically distributed (IID) evaluation across evolving mainstream domains, whereas the latter evaluates on non-IID scenarios with emerging model ability. Methodologically, we propose preventing catastrophic interference through parameter isolation, along with an MLLM-based routing mechanism. Extensive experiments demonstrate that our approach can integrate domain-specific knowledge and functional abilities with minimal forgetting, significantly outperforming existing methods.