Emergent Low-Rank Training Dynamics in MLPs with Smooth Activations

📅 2026-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the empirically observed phenomenon that weights in nonlinear multilayer perceptrons (MLPs) often evolve within a low-dimensional subspace during training, despite the lack of theoretical understanding. We provide the first precise characterization of this low-rank dynamics, proving that under smooth activation functions and gradient descent, the weight trajectories remain confined to an invariant low-dimensional subspace determined by initialization. By integrating dynamical systems theory, gradient flow analysis, and low-rank matrix approximation, we uncover the structure and emergence mechanism of this subspace and leverage these insights to design a low-rank parameterization scheme. Experiments across multiple classification tasks demonstrate that low-rank MLPs initialized within this subspace achieve performance comparable to their full-parameter counterparts, confirming the generalizability and practical relevance of our theoretical findings.

Technology Category

Application Category

📝 Abstract
Recent empirical evidence has demonstrated that the training dynamics of large-scale deep neural networks occur within low-dimensional subspaces. While this has inspired new research into low-rank training, compression, and adaptation, theoretical justification for these dynamics in nonlinear networks remains limited. %compared to deep linear settings. To address this gap, this paper analyzes the learning dynamics of multi-layer perceptrons (MLPs) under gradient descent (GD). We demonstrate that the weight dynamics concentrate within invariant low-dimensional subspaces throughout training. Theoretically, we precisely characterize these invariant subspaces for two-layer networks with smooth nonlinear activations, providing insight into their emergence. Experimentally, we validate that this phenomenon extends beyond our theoretical assumptions. Leveraging these insights, we empirically show there exists a low-rank MLP parameterization that, when initialized within the appropriate subspaces, matches the classification performance of fully-parameterized counterparts on a variety of classification tasks.
Problem

Research questions and friction points this paper is trying to address.

low-rank dynamics
training dynamics
MLPs
smooth activations
invariant subspaces
Innovation

Methods, ideas, or system contributions that make the work stand out.

low-rank training
MLP dynamics
invariant subspaces
smooth activations
gradient descent
🔎 Similar Papers
No similar papers found.