🤖 AI Summary
This work investigates the gradient flow dynamics during early training of deep homogeneous neural networks (of degree > 2) under small-weight initialization. Methodologically, it integrates gradient flow analysis, homogeneous function theory, KKT optimality conditions, and a local Lipschitz assumption on gradients. Theoretically, it establishes that—under such initialization—the norm of weights remains nearly constant while their direction rapidly converges to a KKT point of the neural correlation function. This constitutes the first rigorous guarantee for directional convergence in high-degree homogeneous networks during training onset. Furthermore, the paper derives necessary and sufficient conditions for the existence of rank-one KKT points in ReLU and polynomial-ReLU networks, and systematically characterizes the structural properties of KKT points under common activation functions. Collectively, these results reveal that implicit optimization biases in deep networks arise from geometric constraints imposed by the neural correlation function, providing a novel theoretical foundation for understanding initialization sensitivity and implicit regularization in overparameterized settings.
📝 Abstract
This paper studies the gradient flow dynamics that arise when training deep homogeneous neural networks assumed to have locally Lipschitz gradients and an order of homogeneity strictly greater than two. It is shown here that for sufficiently small initializations, during the early stages of training, the weights of the neural network remain small in (Euclidean) norm and approximately converge in direction to the Karush-Kuhn-Tucker (KKT) points of the recently introduced neural correlation function. Additionally, this paper also studies the KKT points of the neural correlation function for feed-forward networks with (Leaky) ReLU and polynomial (Leaky) ReLU activations, deriving necessary and sufficient conditions for rank-one KKT points.