Orthogonal Model Merging

📅 2026-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing large language model merging methods perform linear interpolation in Euclidean space, which disrupts the intrinsic geometric structure of pre-trained weights—such as their hyperspherical energy distribution—leading to performance degradation and catastrophic forgetting. This work proposes OrthoMerge, the first approach to formulate model merging on the Riemannian manifold of the orthogonal group. By leveraging Lie algebra mappings to model task-specific orthogonal transformations and integrating an orthogonality-residual decoupling strategy, OrthoMerge enables geometry-preserving, efficient fusion. The method is compatible with non-OFT fine-tuned models, substantially mitigates forgetting, and maintains or even enhances performance across multitask settings, thereby demonstrating the critical importance of preserving the geometric structure of weights in model merging.

Technology Category

Application Category

📝 Abstract
Merging finetuned Large Language Models (LLMs) has become increasingly important for integrating diverse capabilities into a single unified model. However, prevailing model merging methods rely on linear arithmetic in Euclidean space, which often destroys the intrinsic geometric properties of pretrained weights, such as hyperspherical energy. To address this, we propose Orthogonal Model Merging (OrthoMerge), a method that performs merging operations on the Riemannian manifold formed by the orthogonal group to preserve the geometric structure of the model's weights. By mapping task-specific orthogonal matrices learned by Orthogonal Finetuning (OFT) to the Lie algebra, OrthoMerge enables a principled yet efficient integration that takes into account both the direction and intensity of adaptations. In addition to directly leveraging orthogonal matrices obtained by OFT, we further extend this approach to general models finetuned with non-OFT methods (i.e., low-rank finetuning, full finetuning) via an Orthogonal-Residual Decoupling strategy. This technique extracts the orthogonal components of expert models by solving the orthogonal Procrustes problem, which are then merged on the manifold of the orthogonal group, while the remaining linear residuals are processed through standard additive merging. Extensive empirical results demonstrate the effectiveness of OrthoMerge in mitigating catastrophic forgetting and maintaining model performance across diverse tasks.
Problem

Research questions and friction points this paper is trying to address.

Model Merging
Geometric Structure Preservation
Catastrophic Forgetting
Orthogonal Group
Large Language Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Orthogonal Model Merging
Riemannian manifold
Orthogonal Finetuning
Orthogonal Procrustes problem
Geometric structure preservation
🔎 Similar Papers
No similar papers found.