MAGIC: Achieving Superior Model Merging via Magnitude Calibration

📅 2025-12-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing model merging methods overemphasize feature direction alignment while neglecting feature magnitude calibration, leading to degraded performance in merged models. This work systematically identifies— for the first time—the critical role of feature magnitude in fusion robustness and proposes a plug-and-play magnitude calibration framework comprising three variants: Feature-Space Calibration (FSC, requiring unlabeled data), Weight-Space Calibration (WSC, entirely data-free), and Dual-Space Calibration (DSC, synergizing both spaces). The framework requires no fine-tuning or additional training, enabling unified integration of specialized model capabilities. Evaluated across eight computer vision benchmarks, it achieves an average improvement of 4.3%; on NLP tasks with Llama-series models, it yields an 8.0% gain. These results significantly outperform state-of-the-art training-free merging approaches and break the long-standing paradigm that relies solely on directional alignment.

Technology Category

Application Category

📝 Abstract
The proliferation of pre-trained models has given rise to a wide array of specialised, fine-tuned models. Model merging aims to merge the distinct capabilities of these specialised models into a unified model, requiring minimal or even no additional training. A core objective of model merging is to ensure the merged model retains the behavioural characteristics of the specialised models, typically achieved through feature alignment. We identify that features consist of two critical components: direction and magnitude. Prior research has predominantly focused on directional alignment, while the influence of magnitude remains largely neglected, despite its pronounced vulnerability to perturbations introduced by common merging operations (e.g., parameter fusion and sparsification). Such perturbations to magnitude inevitably lead to feature deviations in the merged model from the specialised models, resulting in subsequent performance degradation. To address this, we propose MAGnItude Calibration (MAGIC), a plug-and-play framework that rectifies layer-wise magnitudes in feature and weight spaces, with three variants. Specifically, our Feature Space Calibration (FSC) realigns the merged model's features using a small set of unlabelled data, while Weight Space Calibration (WSC) extends this calibration to the weight space without requiring additional data. Combining these yields Dual Space Calibration (DSC). Comprehensive experiments demonstrate that MAGIC consistently boosts performance across diverse Computer Vision tasks (+4.3% on eight datasets) and NLP tasks (+8.0% on Llama) without additional training. Our code is available at: https://github.com/lyymuwu/MAGIC
Problem

Research questions and friction points this paper is trying to address.

Addresses feature magnitude neglect in model merging.
Proposes calibration to reduce performance degradation.
Enhances merged model accuracy without extra training.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Magnitude calibration framework for model merging
Feature and weight space calibration without extra training
Plug-and-play method improving vision and NLP tasks
🔎 Similar Papers
No similar papers found.
Yayuan Li
Yayuan Li
University of Michigan
AR-AI Instructional AgentInstructional VideosVideo GenerationVision and Language
J
Jian Zhang
State Key Laboratory for Novel Software Technology and the National Institute of Healthcare Data Science, Nanjing University, Nanjing, Jiangsu 210093, China
J
Jintao Guo
State Key Laboratory for Novel Software Technology and the National Institute of Healthcare Data Science, Nanjing University, Nanjing, Jiangsu 210093, China
Z
Zihan Cheng
Shanghai Jiao Tong University Medical School Affiliated Ruijin Hospital, Shanghai 200025, China; and also with the National Institute of Healthcare Data Science, Nanjing University, Nanjing, Jiangsu 210093, China
L
Lei Qi
School of Computer Science and Engineering, Key Lab of Computer Network and Information Integration, Southeast University, Nanjing, Jiangsu 211189, China
Y
Yinghuan Shi
State Key Laboratory for Novel Software Technology and the National Institute of Healthcare Data Science, Nanjing University, Nanjing, Jiangsu 210093, China
Y
Yang Gao
State Key Laboratory for Novel Software Technology and the National Institute of Healthcare Data Science, Nanjing University, Nanjing, Jiangsu 210093, China