Continual Learning in Vision-Language Models via Aligned Model Merging

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

To address the stability-plasticity dilemma in vision-language models (VLMs) arising from sequential fine-tuning in continual learning, this paper proposes a weight-alignment-based model fusion approach—the first to introduce model fusion into VLM continual learning. Our method explicitly aligns the semantic weight spaces across tasks, mitigating parameter interference during fusion and thereby simultaneously preserving knowledge stability for previously learned tasks and enabling plasticity for new tasks—without requiring experience replay or regularization. Experiments demonstrate substantial mitigation of catastrophic forgetting, enhanced robustness on multi-task sequences and high-similarity task settings, and improved cross-task generalization. The core innovation lies in replacing sequential parameter updates with an alignment-driven fusion mechanism, establishing a novel paradigm for continual learning in VLMs.

Technology Category

Application Category

📝 Abstract

Continual learning is conventionally tackled through sequential fine-tuning, a process that, while enabling adaptation, inherently favors plasticity over the stability needed to retain prior knowledge. While existing approaches attempt to mitigate catastrophic forgetting, a bias towards recent tasks persists as they build upon this sequential nature. In this work we present a new perspective based on model merging to maintain stability while still retaining plasticity. Rather than just sequentially updating the model weights, we propose merging newly trained task parameters with previously learned ones, promoting a better balance. To maximize the effectiveness of the merging process, we propose a simple mechanism that promotes learning aligned weights with previous ones, thereby avoiding interference when merging. We evaluate this approach on large Vision-Language Models (VLMs), and demonstrate its effectiveness in reducing forgetting, increasing robustness to various task orders and similarities, and improving generalization.

Problem

Research questions and friction points this paper is trying to address.

Addresses catastrophic forgetting in continual learning

Balances stability and plasticity via aligned model merging

Enhances vision-language models' robustness and generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligned model merging for continual learning

Balances stability and plasticity via merging

Promotes aligned weights to avoid interference

🔎 Similar Papers

Non-autoregressive Sequence-to-Sequence Vision-Language Models