Model Merging by Output-Space Projection

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work addresses the problem of efficiently merging multiple fine-tuned models into a unified multitask model without retraining. The authors formalize model merging as a convex quadratic program over residual updates, achieving theoretically optimal fusion by calibrating inputs and outputs to minimize calibration error in the output space. This study provides the first formal optimality guarantees for model merging, introduces an interpretable diagnostic metric based on residual energy, and unifies existing heuristic approaches within a single theoretical framework as special cases. Experimental results demonstrate that the proposed method matches or surpasses current techniques in single-layer settings and consistently improves performance across language and vision benchmarks in multilayer merging scenarios. Furthermore, the quality of merged models can be accurately predicted using a small calibration set.

📝 Abstract

Model merging combines fine-tuned checkpoints into a single multi-task model without retraining. Existing methods - such as task arithmetic, model soups, TIES, and DARE - are computationally efficient and empirically successful, but rely on heuristic design choices and lack formal optimality guarantees. We show that merging can be formulated as a convex quadratic programme over residual updates, yielding weights that minimise a squared-output calibration objective using calibration inputs and fine-tuned model outputs, and subsuming existing methods as special cases. Our framework yields a closed-form diagnostic - the fraction of residual energy captured by a chosen basis - that predicts downstream merge quality using only the calibration set. Empirically, the QP matches or outperforms existing methods in the single-layer setting, and we characterise when the optimal basis provides significant gains over the cheaper diagonal QP. We extend to multi-layer merging via a sequential layer-wise algorithm and demonstrate consistent gains across language and vision benchmarks.

Problem

Research questions and friction points this paper is trying to address.

model merging

multi-task learning

optimality guarantees

fine-tuned models

quadratic programming

Innovation

Methods, ideas, or system contributions that make the work stand out.

model merging

output-space projection

convex quadratic programming