Do We Really Need Permutations? Impact of Width Expansion on Linear Mode Connectivity

📅 2025-10-09

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Linear mode connectivity (LMC) typically requires parameter permutations to construct low-loss linear interpolation paths between distinct minima, imposing structural constraints and computational overhead. Method: This work investigates whether model width expansion—without parameter permutation—can inherently enable LMC. We propose scaling model width coupled with softmax temperature calibration to directly construct low-loss linear connections. We further introduce the Layer-wise Exponentially Weighted Connectivity (LEWC) mechanism, theoretically and empirically demonstrating that linear interpolation of intermediate-layer outputs in wide models is equivalent to ensemble prediction in the original narrow model. Results: Our approach achieves LMC performance on par with permutation-based methods on CIFAR-10, CIFAR-100, and ImageNet, while eliminating the need for parameter rearrangement. It significantly reduces architectural constraints and computational cost, providing the first unified empirical and theoretical evidence that width expansion alone suffices to induce LMC.

Technology Category

Application Category

📝 Abstract

Recently, Ainsworth et al. empirically demonstrated that, given two independently trained models, applying a parameter permutation that preserves the input-output behavior allows the two models to be connected by a low-loss linear path. When such a path exists, the models are said to achieve linear mode connectivity (LMC). Prior studies, including Ainsworth et al., have reported that achieving LMC requires not only an appropriate permutation search but also sufficiently wide models (e.g., a 32 $ imes$ width multiplier for ResNet-20). This is broadly believed to be because increasing the model width ensures a large enough space of candidate permutations, increasing the chance of finding one that yields LMC. In this work, we empirically demonstrate that, even without any permutations, simply widening the models is sufficient for achieving LMC when using a suitable softmax temperature calibration. We further explain why this phenomenon arises by analyzing intermediate layer outputs. Specifically, we introduce layerwise exponentially weighted connectivity (LEWC), which states that the output of each layer of the merged model can be represented as an exponentially weighted sum of the outputs of the corresponding layers of the original models. Consequently the merged model's output matches that of an ensemble of the original models, which facilitates LMC. To the best of our knowledge, this work is the first to show that widening the model not only facilitates nonlinear mode connectivity, as suggested in prior research, but also significantly increases the possibility of achieving linear mode connectivity.

Problem

Research questions and friction points this paper is trying to address.

Investigating width expansion impact on linear mode connectivity

Challenging permutation necessity for connecting trained models

Proposing temperature calibration method to achieve connectivity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Widening models enables linear connectivity without permutations

Softmax temperature calibration facilitates merging model outputs

Layerwise exponentially weighted connectivity explains ensemble-like behavior

🔎 Similar Papers

Training-time Neuron Alignment through Permutation Subspace for Improving Linear Mode Connectivity and Model Fusion