Unlocking the Potential of Continual Model Merging: An ODE Perspective

📅 2026-05-19
📈 Citations: 0
Influential: 0
📄 PDF

career value

223K/year
🤖 AI Summary
This work addresses the limitation of existing continual model merging approaches, which lack explicit control over task learning capacity and thus suffer severe catastrophic forgetting when tasks exhibit heterogeneous importance. The paper introduces a novel formulation that models model merging as a continuous path evolution in parameter space, governed by an ordinary differential equation (ODE). By incorporating a time-dependent velocity field and loss-barrier constraints, the method dynamically integrates models from old and new tasks along low-loss trajectories. This enables controllable, coherent, and scalable model evolution. Empirical results demonstrate that the proposed approach significantly mitigates catastrophic forgetting on standard continual learning benchmarks, achieving state-of-the-art performance—particularly in scenarios where tasks vary substantially in importance.
📝 Abstract
Continual Model Merging (CMM) enables rapid customization of foundation models across sequentially arriving tasks, offering a scalable alternative to repeated retraining. However, existing merging rules lack explicit controllability over the allocation of learning capacity between previously learned capabilities and newly merged models. Consequently, as tasks are merged sequentially, this deficiency accumulates into severe forgetting, particularly in scenarios with heterogeneous task importance, where performance allocation becomes highly inconsistent. The key reason can be attributed to the fact that previous methods treat each task model as an isolated parameter point and apply fixed algebraic combinations, rather than explicitly constructing a transition that respects how independently trained models can be connected in parameter space. Motivated by mode connectivity, we assume that desirable merged models lie on low loss connecting paths, and that continual merging should follow such paths without crossing loss barriers that induce forgetting. Grounded in these insights, we propose a novel ODE-driven Merging (ODE-M) tailored for CMM that traces such a path by integrating a time-dependent velocity field and enforcing barrier constraints to prevent loss-increasing steps. Extensive experiments demonstrate that ODE-M achieves state-of-the-art performance compared to its competitors across mainstream CMM benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Continual Model Merging
catastrophic forgetting
task heterogeneity
learning capacity allocation
parameter space connectivity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Continual Model Merging
ODE-driven Merging
Mode Connectivity
Forgetting Mitigation
Parameter Space Path
🔎 Similar Papers
No similar papers found.