Generalized Linear Mode Connectivity for Transformers

📅 2025-06-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Understanding the geometric structure of loss landscapes is critical in deep learning. While Linear Mode Connectivity (LMC) reveals low- or zero-loss linear paths between independently trained models, conventional symmetry modeling—relying solely on neuron permutations—fails to capture the complex symmetries inherent in modern architectures like Transformers. This work introduces the first symmetry-aware generalized LMC framework, unifying four symmetry classes: permutations, semi-permutations, orthogonal transformations, and invertible mappings. Based on this, we propose symmetry-adapted model reparameterization and optimized path interpolation techniques. Experiments demonstrate, for the first time, the existence of low- or zero-loss linear connections between Vision Transformers and GPT-2—architecturally heterogeneous models—thereby transcending architectural boundaries. Our results provide empirical evidence of deep, cross-architecture shared geometric structures within neural loss landscapes.

Technology Category

Application Category

📝 Abstract
Understanding the geometry of neural network loss landscapes is a central question in deep learning, with implications for generalization and optimization. A striking phenomenon is linear mode connectivity (LMC), where independently trained models can be connected by low- or zero-loss paths, despite appearing to lie in separate loss basins. However, this is often obscured by symmetries in parameter space -- such as neuron permutations -- which make functionally equivalent models appear dissimilar. Prior work has predominantly focused on neuron re-ordering through permutations, but such approaches are limited in scope and fail to capture the richer symmetries exhibited by modern architectures such as Transformers. In this work, we introduce a unified framework that captures four symmetry classes: permutations, semi-permutations, orthogonal transformations, and general invertible maps -- broadening the set of valid reparameterizations and subsuming many previous approaches as special cases. Crucially, this generalization enables, for the first time, the discovery of low- and zero-barrier linear interpolation paths between independently trained Vision Transformers and GPT-2 models. These results reveal deeper structure in the loss landscape and underscore the importance of symmetry-aware analysis for understanding model space geometry.
Problem

Research questions and friction points this paper is trying to address.

Understanding loss landscape geometry in deep learning
Extending symmetry classes for model reparameterization
Enabling linear connectivity between Transformer models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified framework for four symmetry classes
Enables low-barrier paths for Transformers
Broadens reparameterization with invertible maps
🔎 Similar Papers
No similar papers found.