🤖 AI Summary
This paper addresses why wide, shallow neural networks are empirically easy to optimize—specifically, whether and how their loss landscapes become increasingly convex as width grows. Method: Focusing on single-hidden-layer networks, the authors analyze the epigraph structure of the input-output mapping in parameter space and employ high-dimensional asymptotic analysis to study the empirical risk function. Contribution/Results: They rigorously prove that, as the hidden-layer width tends to infinity, the empirical risk converges uniformly—in parameter space—to a convex function. This is the first theoretical explanation for the observed rapid convergence and scarcity of poor local minima in training wide shallow networks. The work identifies a “width-driven convexification” mechanism, distinct from classical over-parameterization arguments, highlighting how architectural width—not merely parameter redundancy—shapes optimization geometry. These findings offer a new perspective on the trainability of deep learning models.
📝 Abstract
For a simple model of shallow and wide neural networks, we show that the epigraph of its input-output map as a function of the network parameters approximates epigraph of a. convex function in a precise sense. This leads to a plausible explanation of their observed good performance.