🤖 AI Summary
Existing model merging methods exhibit fragility and inconsistency due to the permutation symmetries inherent in neural network architectures, particularly when averaging in parameter space. This work reframes model merging as a Fréchet mean problem on a manifold, achieving symmetry-invariant fusion by minimizing the sum of geodesic distances. Within a differential-geometric framework, the approach explicitly identifies the choice of metric, manifold structure, and distance approximation as central design considerations. Building on this foundation, the authors derive a practical merging algorithm for LoRA adapters defined on a quotient manifold. The proposed method not only provides a unified geometric interpretation of diverse merging strategies—subsuming Fisher-based merging as a special case—but also achieves substantially better performance than current LoRA merging techniques while preserving symmetry invariance.
📝 Abstract
Model merging aims to combine multiple models into one without additional training. Naïve parameter-space averaging can be fragile under architectural symmetries, as their geometry does not take them into account. In this work we show that not only the geometry, but also the averaging procedure itself, must be symmetry-invariant to achieve symmetry-aware merges. Consequently, we propose a general solution: merging as Fréchet averaging, i.e., selecting parameters that minimize a sum of geodesic distances on an appropriate manifold. In this view, the key design choice is the overall geometry, i.e., the choice of metric, manifold, and distance approximation, that determines what it means for two models to be "close". We show that Fréchet averaging, combined with simplifying assumptions, contains Fisher merging. Building on this, we examine the particular case of low-rank adapters (LoRA), whose symmetries induce a distinct geometry: that of a quotient manifold. We outline the limitations of current LoRA merging methods, propose a practical algorithm for this setting, and show how they compare with other commonly used approaches.