Analyzing the Role of Permutation Invariance in Linear Mode Connectivity

📅 2025-03-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates how permutation invariance affects loss barriers to linear mode connectivity (LMC) in the teacher–student setting of two-layer ReLU networks. Theoretically, we prove that for wide student networks, the LMC loss barrier converges to zero at rate $O(m^{-1/2})$ with respect to width $m$, free of the curse of dimensionality. Crucially, we identify— for the first time—a double-descent behavior in the barrier, tightly linked to a sparsity transition in gradient descent (GD) solutions: high learning rates induce sparse parameterizations, substantially reducing the LMC barrier. Methodologically, we integrate a teacher–student analytical framework, permutation alignment techniques, and GD/SGD dynamical modeling, validated empirically on synthetic data and MNIST. Additional experiments confirm the phenomenon’s consistency in deeper networks. Our core contribution is establishing a quantitative relationship among permutation invariance, network width, optimization-induced sparsity, and LMC connectivity.

Technology Category

Application Category

📝 Abstract
It was empirically observed in Entezari et al. (2021) that when accounting for the permutation invariance of neural networks, there is likely no loss barrier along the linear interpolation between two SGD solutions -- a phenomenon known as linear mode connectivity (LMC) modulo permutation. This phenomenon has sparked significant attention due to both its theoretical interest and practical relevance in applications such as model merging. In this paper, we provide a fine-grained analysis of this phenomenon for two-layer ReLU networks under a teacher-student setup. We show that as the student network width $m$ increases, the LMC loss barrier modulo permutation exhibits a {f double descent} behavior. Particularly, when $m$ is sufficiently large, the barrier decreases to zero at a rate $O(m^{-1/2})$. Notably, this rate does not suffer from the curse of dimensionality and demonstrates how substantial permutation can reduce the LMC loss barrier. Moreover, we observe a sharp transition in the sparsity of GD/SGD solutions when increasing the learning rate and investigate how this sparsity preference affects the LMC loss barrier modulo permutation. Experiments on both synthetic and MNIST datasets corroborate our theoretical predictions and reveal a similar trend for more complex network architectures.
Problem

Research questions and friction points this paper is trying to address.

Analyzes linear mode connectivity in neural networks.
Explores double descent behavior in loss barriers.
Investigates sparsity transition effects on connectivity.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes LMC loss barrier in ReLU networks
Demonstrates double descent behavior with width
Investigates sparsity impact on LMC loss
🔎 Similar Papers
No similar papers found.
K
Keyao Zhan
Department of Biostatistics, Harvard T.H. Chan School of Public Health
Puheng Li
Puheng Li
Statistics PhD Student, Stanford University
StatisticsMachine LearningGenerative AI
L
Lei Wu
School of Mathematical Sciences, Peking University; Center for Machine Learning Research, Peking University