Early Directional Convergence in Deep Homogeneous Neural Networks for Small Initializations

📅 2024-03-12
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the gradient flow dynamics during early training of deep homogeneous neural networks (of degree > 2) under small-weight initialization. Methodologically, it integrates gradient flow analysis, homogeneous function theory, KKT optimality conditions, and a local Lipschitz assumption on gradients. Theoretically, it establishes that—under such initialization—the norm of weights remains nearly constant while their direction rapidly converges to a KKT point of the neural correlation function. This constitutes the first rigorous guarantee for directional convergence in high-degree homogeneous networks during training onset. Furthermore, the paper derives necessary and sufficient conditions for the existence of rank-one KKT points in ReLU and polynomial-ReLU networks, and systematically characterizes the structural properties of KKT points under common activation functions. Collectively, these results reveal that implicit optimization biases in deep networks arise from geometric constraints imposed by the neural correlation function, providing a novel theoretical foundation for understanding initialization sensitivity and implicit regularization in overparameterized settings.

Technology Category

Application Category

📝 Abstract
This paper studies the gradient flow dynamics that arise when training deep homogeneous neural networks assumed to have locally Lipschitz gradients and an order of homogeneity strictly greater than two. It is shown here that for sufficiently small initializations, during the early stages of training, the weights of the neural network remain small in (Euclidean) norm and approximately converge in direction to the Karush-Kuhn-Tucker (KKT) points of the recently introduced neural correlation function. Additionally, this paper also studies the KKT points of the neural correlation function for feed-forward networks with (Leaky) ReLU and polynomial (Leaky) ReLU activations, deriving necessary and sufficient conditions for rank-one KKT points.
Problem

Research questions and friction points this paper is trying to address.

Analyzes gradient flow dynamics in deep homogeneous neural networks.
Examines early directional convergence to KKT points for small initializations.
Studies KKT points for networks with ReLU and polynomial ReLU activations.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes gradient flow in deep homogeneous networks
Focuses on small initializations and early training dynamics
Studies KKT points for ReLU and polynomial activations
🔎 Similar Papers
No similar papers found.