Multimodal Continual Instruction Tuning with Dynamic Gradient Guidance

📅 2025-11-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Catastrophic forgetting—significant performance degradation on previously learned tasks during continual learning—plagues multimodal continual instruction tuning. To address this, we propose Dynamic Gradient Guidance (DGG), the first method to attribute forgetting to the absence of historical task gradients during parameter updates. DGG approximates the geometrically optimal historical parameter direction in weight space and dynamically balances stability and plasticity by integrating real gradients from a limited replay buffer with a Bernoulli sampling mechanism. Crucially, DGG requires no model expansion and incurs zero inference overhead. Evaluated on established multimodal continual instruction tuning benchmarks, DGG achieves state-of-the-art performance: it substantially mitigates catastrophic forgetting while preserving strong generalization capability within a compact model architecture.

Technology Category

Application Category

📝 Abstract
Multimodal continual instruction tuning enables multimodal large language models to sequentially adapt to new tasks while building upon previously acquired knowledge. However, this continual learning paradigm faces the significant challenge of catastrophic forgetting, where learning new tasks leads to performance degradation on previous ones. In this paper, we introduce a novel insight into catastrophic forgetting by conceptualizing it as a problem of missing gradients from old tasks during new task learning. Our approach approximates these missing gradients by leveraging the geometric properties of the parameter space, specifically using the directional vector between current parameters and previously optimal parameters as gradient guidance. This approximated gradient can be further integrated with real gradients from a limited replay buffer and regulated by a Bernoulli sampling strategy that dynamically balances model stability and plasticity. Extensive experiments on multimodal continual instruction tuning datasets demonstrate that our method achieves state-of-the-art performance without model expansion, effectively mitigating catastrophic forgetting while maintaining a compact architecture.
Problem

Research questions and friction points this paper is trying to address.

Addressing catastrophic forgetting in multimodal continual learning models
Approximating missing gradients from old tasks using geometric properties
Balancing model stability and plasticity during sequential task adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Approximates missing gradients using geometric parameter properties
Integrates approximated gradients with limited replay buffer
Dynamically balances stability and plasticity via Bernoulli sampling
🔎 Similar Papers
No similar papers found.