Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion

📅 2025-02-01

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Transformer model merging is hindered by discrete permutation symmetry, leading to suboptimal parameter alignment and limited generalization. Method: We theoretically establish, for the first time, the existence of continuous rotational symmetry in self-attention layers—transcending conventional discrete permutation constraints. Leveraging this property, we derive continuous equivalence classes in parameter space and devise a gradient-based optimal parameter matching algorithm with provable performance guarantees. Our approach integrates rotational matrix modeling, symmetry analysis, and architecture-aware adaptation for Transformers. Results: The method significantly outperforms state-of-the-art permutation-based merging techniques on diverse NLP and multimodal vision benchmarks, demonstrating consistent improvements in accuracy and robustness. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Symmetry in the parameter space of deep neural networks (DNNs) has proven beneficial for various deep learning applications. A well-known example is the permutation symmetry in Multi-Layer Perceptrons (MLPs), where permuting the rows of weight matrices in one layer and applying the inverse permutation to adjacent layers yields a functionally equivalent model. While permutation symmetry fully characterizes the equivalence set for MLPs, its discrete nature limits its utility for transformers. In this paper, we introduce rotation symmetry, a novel form of parameter space symmetry for transformers that generalizes permutation symmetry by rotating parameter matrices in self-attention layers. Unlike permutation symmetry, rotation symmetry operates in a continuous domain, thereby significantly expanding the equivalence set for transformers. Based on this property, we propose a theoretically optimal parameter matching algorithm as a plug-and-play module to enhance model fusion. We evaluate our approach using pre-trained transformers across diverse natural language and vision tasks. Experimental results demonstrate that our rotation symmetry-based matching algorithm substantially improves model fusion, highlighting the potential of parameter space symmetry to facilitate model fusion. Our code is available on https://github.com/zhengzaiyi/RotationSymmetry.

Problem

Research questions and friction points this paper is trying to address.

Transformer Models

Symmetry Adjustment

Sequence Bias

Innovation

Methods, ideas, or system contributions that make the work stand out.

Rotational Symmetry

Parameter Alignment

Cross-modal Matching

🔎 Similar Papers

No similar papers found.