Sharp description of local minima in the loss landscape of high-dimensional two-layer ReLU neural networks

📅 2026-04-10

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work investigates the population loss landscape of high-dimensional two-layer ReLU neural networks under Gaussian covariates within a realizable teacher–student setting. By constructing an exact low-dimensional statistical characterization of local minima, the study reveals their hierarchical structure and the emergence of flat connectivity as network width increases. Integrating high-dimensional probabilistic analysis, geometric characterization of the loss landscape, and modeling of SGD dynamics, the authors demonstrate that in well-specified models, local minima are typically isolated, whereas in over-parameterized regimes they become connected via flat paths. This connectivity substantially enhances the probability that optimization algorithms converge to global minima and reduces the risk of becoming trapped in spurious solutions. These findings challenge the sufficiency of prevailing simplifying assumptions and offer deeper insights into the optimization mechanisms of ReLU networks.

Technology Category

Application Category

📝 Abstract

We study the population loss landscape of two-layer ReLU networks of the form $\sum_{k=1}^K \mathrm{ReLU}(w_k^\top x)$ in a realisable teacher-student setting with Gaussian covariates. We show that local minima admit an exact low-dimensional representation in terms of summary statistics, yielding a sharp and interpretable characterisation of the landscape. We further establish a direct link with one-pass SGD: local minima correspond to attractive fixed points of the dynamics in summary statistics space. This perspective reveals a hierarchical structure of minima: they are typically isolated in the well-specified regime, but become connected by flat directions as network width increases. In this overparameterised regime, global minima become increasingly accessible, attracting the dynamics and reducing convergence to spurious solutions. Overall, our results reveal intrinsic limitations of common simplifying assumptions, which may miss essential features of the loss landscape even in minimal neural network models.

Problem

Research questions and friction points this paper is trying to address.

loss landscape

local minima

two-layer ReLU networks

overparameterization

teacher-student setting

Innovation

Methods, ideas, or system contributions that make the work stand out.

ReLU neural networks

loss landscape

local minima