🤖 AI Summary
This work investigates the geometric structure of the loss landscape near global minima and the convergence behavior of gradient flow for overparameterized two-layer neural networks. We address the phenomenon wherein zero-generalization-error global minima become geometrically separated as sample size increases, and establish, for the first time, a quantitative link between this separation and generalization error—overcoming conventional analyses that focus solely on solution existence while neglecting identifiability. Method: Integrating tools from nonconvex optimization, differential geometry, and gradient flow dynamics, we develop a novel local stability analysis framework and a technique for characterizing sample complexity. Contribution/Results: Under mild assumptions, we prove local recoverability: gradient flow converges to a zero-generalization-error solution at a rate jointly determined by network width and sample size.
📝 Abstract
Under mild assumptions, we investigate the geometry of the loss landscape for two-layer neural networks in the vicinity of global minima. Utilizing novel techniques, we demonstrate: (i) how global minima with zero generalization error become geometrically separated from other global minima as the sample size grows; and (ii) the local convergence properties and rate of gradient flow dynamics. Our results indicate that two-layer neural networks can be locally recovered in the regime of overparameterization.