Convergence of Shallow ReLU Networks on Weakly Interacting Data

📅 2025-02-24

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

This paper investigates the global convergence of gradient flow for single-hidden-layer ReLU networks under high-dimensional, weakly correlated (particularly orthogonal) input data. Addressing the more realistic setting of low inter-sample correlation, we introduce and validate the first integration of a high-dimensional weak-correlation assumption with the Polyak–Łojasiewicz (PL) condition. We theoretically establish that global convergence is achieved with high probability using only logarithmic width—i.e., $O(log n)$. Furthermore, we uncover a phase transition in convergence rate under orthogonality: the critical scaling threshold is $1/log n$, with convergence order lying between $1/n$ and $1/sqrt{n}$; we rigorously derive an exponential $1/n$ convergence rate. This work establishes, for the first time, a theoretical framework guaranteeing global convergence for narrow-width (logarithmic) networks on weakly interacting data, offering new insights into optimization dynamics of deep networks over high-dimensional sparse structures.

Technology Category

Application Category

📝 Abstract

We analyse the convergence of one-hidden-layer ReLU networks trained by gradient flow on $n$ data points. Our main contribution leverages the high dimensionality of the ambient space, which implies low correlation of the input samples, to demonstrate that a network with width of order $log(n)$ neurons suffices for global convergence with high probability. Our analysis uses a Polyak-{L}ojasiewicz viewpoint along the gradient-flow trajectory, which provides an exponential rate of convergence of $frac{1}{n}$. When the data are exactly orthogonal, we give further refined characterizations of the convergence speed, proving its asymptotic behavior lies between the orders $frac{1}{n}$ and $frac{1}{sqrt{n}}$, and exhibiting a phase-transition phenomenon in the convergence rate, during which it evolves from the lower bound to the upper, and in a relative time of order $frac{1}{log(n)}$.

Problem

Research questions and friction points this paper is trying to address.

Convergence of shallow ReLU networks

Global convergence with high probability

Phase-transition in convergence rate

Innovation

Methods, ideas, or system contributions that make the work stand out.

ReLU networks converge globally

Width proportional to log(n)

Polyak-Łojasiewicz ensures exponential rate

🔎 Similar Papers

No similar papers found.