The Implicit Bias of Depth: From Neural Collapse to Softmax Codes

📅 2026-05-21
📈 Citations: 0
Influential: 0
📄 PDF

career value

245K/year
🤖 AI Summary
This work investigates the mechanisms underlying the deviation from Neural Collapse (NC) in deep neural networks trained without explicit L2 regularization, revealing its connection to depth-induced implicit low-rank bias. By leveraging an equivalent deep linear network model, spectral initialization analysis, singular value dynamics tracking, and gradient flow theory under multiclass cross-entropy loss, the study provides the first asymptotic and dynamic characterization of such implicit bias arising from depth. The findings demonstrate that increased depth promotes the emergence of low-rank feature solutions and drives feature structure convergence toward high-dimensional maximum-margin softmax codes via an early-stage singular value repulsion mechanism. Moreover, under random initialization, larger width favors higher-rank solutions, thereby elucidating the complementary roles of depth and width in representation learning.
📝 Abstract
Neural collapse (NC) describes the structured geometry that emerges in the features and weights of trained classifiers. Recent theory suggests NC can be suboptimal in deep architectures, attributing this to an explicit low-rank bias from L2 regularization. We study the deep unconstrained feature model (UFM)-equivalent to a deep linear network with orthogonal inputs-trained without regularization, to isolate how gradient descent and depth alone shape NC. We show that depth induces an implicit low-rank bias: low-rank matrices propagate norm more efficiently through successive multiplications, promoting low-rank alternatives to NC. These alternatives, we argue, correspond to softmax codes: max-margin solutions previously found in width-bottlenecked networks. Analyzing training dynamics under spectral initialization, we identify an early-time repulsion among singular values that drives low-rank emergence, and characterize how depth shrinks NC's basin of attraction. Finally, we show that some effects act in the opposite direction: for randomly initialized networks, increasing width biases training toward higher-rank solutions. Our results provide the first asymptotic and dynamic characterization of implicit bias in deep UFMs trained with unregularized multiclass cross-entropy.
Problem

Research questions and friction points this paper is trying to address.

neural collapse
implicit bias
deep learning
low-rank
softmax codes
Innovation

Methods, ideas, or system contributions that make the work stand out.

implicit bias
neural collapse
deep linear networks
softmax codes
low-rank dynamics