🤖 AI Summary
This paper investigates the global convergence of gradient flow for single-hidden-layer ReLU networks under high-dimensional, weakly correlated (particularly orthogonal) input data. Addressing the more realistic setting of low inter-sample correlation, we introduce and validate the first integration of a high-dimensional weak-correlation assumption with the Polyak–Łojasiewicz (PL) condition. We theoretically establish that global convergence is achieved with high probability using only logarithmic width—i.e., $O(log n)$. Furthermore, we uncover a phase transition in convergence rate under orthogonality: the critical scaling threshold is $1/log n$, with convergence order lying between $1/n$ and $1/sqrt{n}$; we rigorously derive an exponential $1/n$ convergence rate. This work establishes, for the first time, a theoretical framework guaranteeing global convergence for narrow-width (logarithmic) networks on weakly interacting data, offering new insights into optimization dynamics of deep networks over high-dimensional sparse structures.
📝 Abstract
We analyse the convergence of one-hidden-layer ReLU networks trained by gradient flow on $n$ data points. Our main contribution leverages the high dimensionality of the ambient space, which implies low correlation of the input samples, to demonstrate that a network with width of order $log(n)$ neurons suffices for global convergence with high probability. Our analysis uses a Polyak-{L}ojasiewicz viewpoint along the gradient-flow trajectory, which provides an exponential rate of convergence of $frac{1}{n}$. When the data are exactly orthogonal, we give further refined characterizations of the convergence speed, proving its asymptotic behavior lies between the orders $frac{1}{n}$ and $frac{1}{sqrt{n}}$, and exhibiting a phase-transition phenomenon in the convergence rate, during which it evolves from the lower bound to the upper, and in a relative time of order $frac{1}{log(n)}$.