MCITlib: Multimodal Continual Instruction Tuning Library and Benchmark

📅 2025-08-10

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This paper addresses catastrophic forgetting and impaired cross-modal coordination in multimodal large language models (MLLMs) during continual instruction tuning. To tackle these challenges, we propose MCL-Instruction—the first open-source framework specifically designed for multimodal continual instruction tuning. It uniformly integrates eight state-of-the-art continual learning algorithms and establishes a standardized benchmark comprising two high-quality multimodal continual instruction datasets. The framework supports cross-modal alignment, inter-modal knowledge transfer, and progressive instruction adaptation. Comprehensive experiments demonstrate substantial mitigation of forgetting and significant improvements in cross-task generalization. All components—including source code, datasets, and evaluation tools—are publicly released to ensure full reproducibility and extensibility. MCL-Instruction thus provides foundational technical infrastructure and a community-standard benchmark for advancing research in multimodal continual learning.

Technology Category

Application Category

📝 Abstract

Continual learning aims to equip AI systems with the ability to continuously acquire and adapt to new knowledge without forgetting previously learned information, similar to human learning. While traditional continual learning methods focusing on unimodal tasks have achieved notable success, the emergence of Multimodal Large Language Models has brought increasing attention to Multimodal Continual Learning tasks involving multiple modalities, such as vision and language. In this setting, models are expected to not only mitigate catastrophic forgetting but also handle the challenges posed by cross-modal interactions and coordination. To facilitate research in this direction, we introduce MCITlib, a comprehensive and constantly evolving code library for continual instruction tuning of Multimodal Large Language Models. In MCITlib, we have currently implemented 8 representative algorithms for Multimodal Continual Instruction Tuning and systematically evaluated them on 2 carefully selected benchmarks. MCITlib will be continuously updated to reflect advances in the Multimodal Continual Learning field. The codebase is released at https://github.com/Ghy0501/MCITlib.

Problem

Research questions and friction points this paper is trying to address.

Enabling AI to learn continuously without forgetting previous knowledge

Addressing challenges in multimodal continual learning tasks

Providing tools for continual instruction tuning of multimodal models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal continual instruction tuning library

Eight algorithms for multimodal learning

Benchmarking cross-modal interaction challenges

🔎 Similar Papers

ModalPrompt:Dual-Modality Guided Prompt for Continual Learning of Large Multimodal Models