🤖 AI Summary
This work addresses the co-occurring challenges of backbone parameter saturation and expert redundancy in continual model merging, as well as bottlenecks induced by data-driven routing. To this end, the authors propose MADE-IT, a novel approach featuring a manifold-aware dynamic expert evolution mechanism that autonomously adds or removes experts based on projection subspace affinity and a distribution-aware adaptive threshold. Additionally, MADE-IT introduces a data-free, training-free implicit routing strategy that guides expert activation through feature–subspace alignment. Experimental results demonstrate that MADE-IT significantly outperforms baseline methods on long-sequence and out-of-order tasks, achieving higher accuracy and robustness while substantially reducing expert redundancy—particularly within general-purpose modules and shallow network layers.
📝 Abstract
Continual Model Merging (CMM) sequentially integrates task-specific models into a unified architecture without intensive retraining. However, existing CMM methods are hindered by a fundamental saturation-redundancy dilemma: backbone-centric approaches face parameter saturation and representation interference within fixed capacities, whereas Mixture-of-Experts (MoE) variants resort to indiscriminate expansion, incurring expert redundancy and a routing bottleneck reliant on additional data-driven optimization. To resolve these challenges, we propose MADE-IT (Manifold-Aware Dynamic Expert Evolution and Implicit rouTing), an adaptive CMM method that orchestrates expert management and activation by grounding intrinsic expert representations in manifold geometry. We introduce a projection-based subspace affinity metric coupled with a distribution-aware adaptive threshold mechanism to guide autonomous expert evolution, harmonizing diversity with architectural parsimony. Furthermore, to bypass parameterized gating networks, we design a data-free and training-free implicit routing mechanism that activates experts via feature-subspace alignment. Extensive experiments demonstrate that MADE-IT consistently outperforms strong baselines in accuracy and robustness across long-horizon and shuffled task sequences, while significantly pruning redundant experts, particularly within generic modules and early layers.