Dynamic Model Merging Made Slim

📅 2026-05-17

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Existing dynamic model merging approaches suffer from suboptimal parameter allocation between shared and expert modules, struggling to balance accuracy and efficiency. This work proposes DiDi-Merging, a novel framework that introduces differentiable rank allocation into dynamic merging for the first time, enabling efficient and compact multi-task models by optimizing the parameter budget of low-rank modules. The method integrates data-free distillation to recover task fidelity and supports dynamic expert activation. Remarkably, DiDi-Merging matches the performance of current methods using only 1.24× the parameters of a single fine-tuned model and surpasses them at 1.4×, substantially reducing the storage overhead compared to other approaches that typically require more than 2× the base model size.

📝 Abstract

Model merging enables the reuse of fine-tuned models without joint training or access to original data. Dynamic merging further improves flexibility by selectively activating task-relevant parameters and efficiently composing experts across multiple tasks. However, existing dynamic methods either maintain a full shared model with tiny experts or allocate excessive capacity to experts, leading to suboptimal accuracy--efficiency trade-offs. To address this, we propose DiDi-Merging, a slim dynamic merging framework that leverages differentiable rank allocation to balance shared and expert parameters. By formulating parameter budgeting as differentiable rank optimization in low-rank modules and introducing a data-free refinement step to recover task fidelity, DiDi-Merging matches prior dynamic baselines at only 1.24x the parameters of a single fine-tuned model and surpasses them at 1.4x, substantially more compact than methods requiring > 2x storage. DiDi-Merging applies across vision, language, and multimodal tasks.

Problem

Research questions and friction points this paper is trying to address.

dynamic model merging

parameter efficiency

accuracy-efficiency trade-off

model compression

low-rank adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

dynamic model merging

differentiable rank allocation

low-rank adaptation