Why Do More Experts Fail? A Theoretical Analysis of Model Merging

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the scalability bottleneck in model merging—specifically, the performance degradation observed as the number of experts increases. We establish, for the first time, a theoretical framework grounded in Gaussian width and approximate kinematics, revealing parameter-space saturation as the fundamental limiting factor; we further prove that performance gains exhibit strictly concave decay and admit a unique optimal merging threshold. Building on this insight, we propose Reparameterized Heavy-Tailed (RHT) merging, which alleviates saturation constraints via heavy-tailed reparameterization of expert weights. Extensive evaluation across 12 knowledge-intensive and general-purpose benchmarks demonstrates that RHT significantly delays performance decay and raises the upper bound for multi-task fusion. The implementation is open-sourced. To our knowledge, this is the first theoretically grounded paradigm for scalable model merging, offering provable guarantees on convergence behavior and capacity limits.

Technology Category

Application Category

📝 Abstract
Model merging dramatically reduces storage and computational resources by combining multiple expert models into a single multi-task model. Although recent model merging methods have shown promising results, they struggle to maintain performance gains as the number of merged models increases. In this paper, we investigate the key obstacles that limit the scalability of model merging when integrating a large number of expert models. First, we prove that there is an upper bound on model merging. Further theoretical analysis reveals that the limited effective parameter space imposes a strict constraint on the number of models that can be successfully merged. Gaussian Width shows that the marginal benefit of merging additional models diminishes according to a strictly concave function. This implies that the effective parameter space becomes rapidly saturated as the number of merged models increases. Furthermore, using Approximate Kinematics Theory, we prove the existence of a unique optimal threshold beyond which adding more models does not yield significant performance improvements. At the same time, we introduce a straightforward Reparameterized Heavy-Tailed method (RHT) to extend the coverage of the merged model, thereby enhancing its performance. Empirical results on 12 benchmarks, including both knowledge-intensive and general-purpose tasks, validate our theoretical analysis. We believe that these results spark further research beyond the current scope of model merging. The source code is in the anonymous Github repository https://github.com/wzj1718/ModelMergingAnalysis.
Problem

Research questions and friction points this paper is trying to address.

Investigates scalability limits of merging multiple expert models
Proves upper bound on model merging due to parameter constraints
Introduces method to enhance merged model performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proves upper bound on model merging scalability
Introduces Reparameterized Heavy-Tailed method (RHT)
Uses Gaussian Width for marginal benefit analysis
🔎 Similar Papers
No similar papers found.
Z
Zijing Wang
Northeastern University, China
X
Xingle Xu
Northeastern University, China
Y
Yongkang Liu
Northeastern University, China
Y
Yiqun Zhang
Northeastern University, China
Peiqin Lin
Peiqin Lin
LMU Munich
Natural Language ProcessingMultilingualityLanguage ModelingSentiment Analysis
S
Shi Feng
Northeastern University, China
Xiaocui Yang
Xiaocui Yang
Lecturer, Northeastern University (China)
Multimodal Sentiment AnalysisData MiningMultimodal Large Language Models
D
Daling Wang
Northeastern University, China
H
Hinrich Schutze
CIS, LMU Munich; MCML, Germany