Towards Adaptive Continual Model Merging via Manifold-Aware Expert Evolution

📅 2026-04-24
📈 Citations: 0
Influential: 0
📄 PDF

career value

227K/year
🤖 AI Summary
This work addresses the co-occurring challenges of backbone parameter saturation and expert redundancy in continual model merging, as well as bottlenecks induced by data-driven routing. To this end, the authors propose MADE-IT, a novel approach featuring a manifold-aware dynamic expert evolution mechanism that autonomously adds or removes experts based on projection subspace affinity and a distribution-aware adaptive threshold. Additionally, MADE-IT introduces a data-free, training-free implicit routing strategy that guides expert activation through feature–subspace alignment. Experimental results demonstrate that MADE-IT significantly outperforms baseline methods on long-sequence and out-of-order tasks, achieving higher accuracy and robustness while substantially reducing expert redundancy—particularly within general-purpose modules and shallow network layers.

Technology Category

Application Category

📝 Abstract
Continual Model Merging (CMM) sequentially integrates task-specific models into a unified architecture without intensive retraining. However, existing CMM methods are hindered by a fundamental saturation-redundancy dilemma: backbone-centric approaches face parameter saturation and representation interference within fixed capacities, whereas Mixture-of-Experts (MoE) variants resort to indiscriminate expansion, incurring expert redundancy and a routing bottleneck reliant on additional data-driven optimization. To resolve these challenges, we propose MADE-IT (Manifold-Aware Dynamic Expert Evolution and Implicit rouTing), an adaptive CMM method that orchestrates expert management and activation by grounding intrinsic expert representations in manifold geometry. We introduce a projection-based subspace affinity metric coupled with a distribution-aware adaptive threshold mechanism to guide autonomous expert evolution, harmonizing diversity with architectural parsimony. Furthermore, to bypass parameterized gating networks, we design a data-free and training-free implicit routing mechanism that activates experts via feature-subspace alignment. Extensive experiments demonstrate that MADE-IT consistently outperforms strong baselines in accuracy and robustness across long-horizon and shuffled task sequences, while significantly pruning redundant experts, particularly within generic modules and early layers.
Problem

Research questions and friction points this paper is trying to address.

Continual Model Merging
parameter saturation
expert redundancy
routing bottleneck
representation interference
Innovation

Methods, ideas, or system contributions that make the work stand out.

Continual Model Merging
Manifold-Aware
Dynamic Expert Evolution
Implicit Routing
Mixture-of-Experts
🔎 Similar Papers
No similar papers found.