🤖 AI Summary
This work addresses limitations in existing model merging approaches, which either neglect the geometric structure of the loss landscape or rely on computationally expensive full-space Hessian approximations, thereby constraining effective knowledge integration. The authors formulate model merging as computing the Fréchet mean on a Riemannian manifold within the low-rank subspace spanned by task vectors, employing the expected Hessian as the metric. This formulation establishes, for the first time, a theoretical link between local curvature and epistemic uncertainty. A rigorous error bound for the merged model is derived, and curvature-aware and spectral methods are shown to be special cases of this unified framework. Experiments on eight image classification tasks using fine-tuned CLIP-ViT models demonstrate that the proposed method consistently outperforms existing baselines in both average and worst-case cross-task accuracy across all backbone architectures.
📝 Abstract
Model merging offers a promising avenue for knowledge integration and parallel development without retraining. Yet, existing methods either ignore the geometry of the loss landscape or rely on intractable full-space Hessian approximations. We propose EpiMer, a framework that casts model merging as solving the Fréchet mean on a Riemannian manifold and restricts the computation to a low-rank subspace spanned by the task vectors. With the expected Hessian as the metric, we reveal a connection between local curvature and epistemic uncertainty of the parameters. Our theoretical analysis decomposes the merging error bound into the subspace Fréchet variance and the residual energy, and provides a closed-form characterization of when curvature-aware merging provably outperforms flat-geometry methods. In addition, our framework unifies both curvature-aware methods and recent spectral methods as special cases of the subspace Fréchet mean with different geometric metrics. Merging fine-tuned CLIP-ViT models on eight image classification tasks, Epistemic Merging strictly outperforms the baselines on all three CLIP-ViT backbones at matched rank, improving the across-task average accuracy and worst-task accuracy on every backbone.