Neural Collapse under Gradient Flow on Shallow ReLU Networks for Orthogonally Separable Data

📅 2025-10-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the emergence mechanism of Neural Collapse (NC) in shallow ReLU networks trained via gradient flow on orthogonally separable data. We formulate a continuous-time gradient flow model for two-layer ReLU networks and, for the first time under unconstrained feature representations, rigorously prove that NC necessarily occurs: the penultimate-layer features converge to a minimal structure wherein within-class features collapse to their class means while inter-class means become mutually orthogonal—yielding sharp, highly discriminative decision boundaries. Our analysis uncovers the intrinsic interplay between orthogonal data geometry and ReLU nonlinearity in inducing NC, and identifies the implicit bias embedded in gradient flow dynamics as the key driver of this structured convergence. These results provide a new theoretical perspective on the geometric nature of representation learning in deep networks.

Technology Category

Application Category

📝 Abstract
Among many mysteries behind the success of deep networks lies the exceptional discriminative power of their learned representations as manifested by the intriguing Neural Collapse (NC) phenomenon, where simple feature structures emerge at the last layer of a trained neural network. Prior works on the theoretical understandings of NC have focused on analyzing the optimization landscape of matrix-factorization-like problems by considering the last-layer features as unconstrained free optimization variables and showing that their global minima exhibit NC. In this paper, we show that gradient flow on a two-layer ReLU network for classifying orthogonally separable data provably exhibits NC, thereby advancing prior results in two ways: First, we relax the assumption of unconstrained features, showing the effect of data structure and nonlinear activations on NC characterizations. Second, we reveal the role of the implicit bias of the training dynamics in facilitating the emergence of NC.
Problem

Research questions and friction points this paper is trying to address.

Analyzing Neural Collapse in shallow ReLU networks
Relaxing unconstrained feature assumptions for data structure
Revealing implicit bias role in training dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gradient flow on two-layer ReLU networks
Analyzing orthogonally separable data structure
Revealing implicit bias in training dynamics
🔎 Similar Papers
No similar papers found.