Neural Rank Collapse: Weight Decay and Small Within-Class Variability Yield Low-Rank Bias

📅 2024-02-06
🏛️ arXiv.org
📈 Citations: 8
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the origin of low-rank bias in deep neural networks and its connection to neural collapse. For general feedforward networks with nonlinear activations, we propose the “neural rank collapse” mechanism: weight decay jointly with intra-class variance in hidden layers drives rapid singular value decay across weight matrices, inducing progressive rank reduction. We establish, for the first time in nonlinear deep networks, a quantitative theoretical link between low-rank bias and neural collapse—extending beyond existing linear-network analyses. Our theory proves that the rank decay rate is proportional to the intra-class variance of the preceding layer’s hidden representations. Using singular value analysis, statistical modeling of latent-space distributions, and extensive experiments across architectures (ResNet, CNN), we empirically validate the mechanism. Furthermore, leveraging this insight, we achieve controllable rank compression of weight matrices by over 30% without sacrificing accuracy.

Technology Category

Application Category

📝 Abstract
Recent work in deep learning has shown strong empirical and theoretical evidence of an implicit low-rank bias: weight matrices in deep networks tend to be approximately low-rank and removing relatively small singular values during training or from available trained models may significantly reduce model size while maintaining or even improving model performance. However, the majority of the theoretical investigations around low-rank bias in neural networks deal with oversimplified deep linear networks. In this work, we consider general networks with nonlinear activations and the weight decay parameter, and we show the presence of an intriguing neural rank collapse phenomenon, connecting the low-rank bias of trained networks with networks' neural collapse properties: as the weight decay parameter grows, the rank of each layer in the network decreases proportionally to the within-class variability of the hidden-space embeddings of the previous layers. Our theoretical findings are supported by a range of experimental evaluations illustrating the phenomenon.
Problem

Research questions and friction points this paper is trying to address.

Proving deep neural collapse links to low-rank matrices
Establishing global optimality of collapsed configurations
Explaining absence of loss barriers between minima
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proves global optimality of neural collapse
Links neural collapse to low-rank matrices
Forecasts singular value structure pre-training
🔎 Similar Papers
No similar papers found.