🤖 AI Summary
Existing model merging methods suffer from global parameter interference or loss of task-specific details, limiting fusion performance. This paper proposes a consensus-aware localized fusion approach to jointly address the tension between global consistency and local detail preservation. Our key contributions are: (1) a novel binary mask optimization mechanism based on consensus alignment to suppress parameter conflicts; (2) a class-balanced entropy minimization sampling strategy to enhance representational equilibrium across tasks; and (3) an efficient sequence-aware fusion framework enabling scalable, high-fidelity multi-task integration. Experiments demonstrate that our method significantly outperforms existing fusion baselines across multiple tasks, closely approaching joint-training performance while exhibiting superior robustness and generalization capability.
📝 Abstract
Model merging aims to integrate the strengths of multiple fine-tuned models into a unified model while preserving task-specific capabilities. Existing methods, represented by task arithmetic, are typically classified into global- and local-aware methods. However, global-aware methods inevitably cause parameter interference, while local-aware methods struggle to maintain the effectiveness of task-specific details in the merged model. To address these limitations, we propose a Consensus-Aware Localized Merging (CALM) method which incorporates localized information aligned with global task consensus, ensuring its effectiveness post-merging. CALM consists of three key components: (1) class-balanced entropy minimization sampling, providing a more flexible and reliable way to leverage unsupervised data; (2) an efficient-aware framework, selecting a small set of tasks for sequential merging with high scalability; (3) a consensus-aware mask optimization, aligning localized binary masks with global task consensus and merging them conflict-free. Experiments demonstrate the superiority and robustness of our CALM, significantly outperforming existing methods and achieving performance close to traditional MTL.