Demystifying Mergeability: Interpretable Properties to Predict Model Merging Success

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

The mechanisms underlying successful model merging remain poorly understood, and a systematic understanding of "mergeability" is lacking. This work proposes an architecture-agnostic analytical framework that employs interpretable pairwise metrics—such as gradient L2 distance and subspace overlap—to linearly optimize and systematically evaluate four prominent merging methods. The study reveals, for the first time, that distinct merging approaches exhibit characteristic "success fingerprints," and identifies subspace overlap and gradient alignment as universal prerequisites for mergeability across methods. Experimental results demonstrate that achieving a subspace overlap of 46.7% and a gradient sign consistency of 55.3% strongly correlates with merging performance, confirming the general predictive power of these two metrics for mergeability.

Technology Category

Application Category

📝 Abstract

Model merging combines knowledge from separately fine-tuned models, yet success factors remain poorly understood. While recent work treats mergeability as an intrinsic property, we show with an architecture-agnostic framework that it fundamentally depends on both the merging method and the partner tasks. Using linear optimization over a set of interpretable pairwise metrics (e.g., gradient L2 distance), we uncover properties correlating with post-merge performance across four merging methods. We find substantial variation in success drivers (46.7% metric overlap; 55.3% sign agreement), revealing method-specific"fingerprints". Crucially, however, subspace overlap and gradient alignment metrics consistently emerge as foundational, method-agnostic prerequisites for compatibility. These findings provide a diagnostic foundation for understanding mergeability and motivate future fine-tuning strategies that explicitly encourage these properties.

Problem

Research questions and friction points this paper is trying to address.

model merging

mergeability

gradient alignment

subspace overlap

fine-tuned models

Innovation

Methods, ideas, or system contributions that make the work stand out.

mergeability

interpretable metrics

subspace overlap