Geometry and Local Recovery of Global Minima of Two-layer Neural Networks at Overparameterization

📅 2023-09-01
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the geometric structure of the loss landscape near global minima and the convergence behavior of gradient flow for overparameterized two-layer neural networks. We address the phenomenon wherein zero-generalization-error global minima become geometrically separated as sample size increases, and establish, for the first time, a quantitative link between this separation and generalization error—overcoming conventional analyses that focus solely on solution existence while neglecting identifiability. Method: Integrating tools from nonconvex optimization, differential geometry, and gradient flow dynamics, we develop a novel local stability analysis framework and a technique for characterizing sample complexity. Contribution/Results: Under mild assumptions, we prove local recoverability: gradient flow converges to a zero-generalization-error solution at a rate jointly determined by network width and sample size.
📝 Abstract
Under mild assumptions, we investigate the geometry of the loss landscape for two-layer neural networks in the vicinity of global minima. Utilizing novel techniques, we demonstrate: (i) how global minima with zero generalization error become geometrically separated from other global minima as the sample size grows; and (ii) the local convergence properties and rate of gradient flow dynamics. Our results indicate that two-layer neural networks can be locally recovered in the regime of overparameterization.
Problem

Research questions and friction points this paper is trying to address.

Geometry of loss landscape near global minima
Separation of global minima with zero error
Local recovery in overparameterized neural networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes loss landscape geometry near global minima
Demonstrates geometric separation of global minima
Shows local recovery in overparameterization regime
L
Leyang Zhang
School of Mathematical Sciences, Institute of Natural Sciences, MOE-LSC Shanghai Jiao Tong University, CMA-Shanghai, Shanghai Artificial Intelligence Laboratory
Yaoyu Zhang
Yaoyu Zhang
Shanghai Jiao Tong University
Deep Learning Theory
T
Tao Luo
School of Mathematical Sciences, Institute of Natural Sciences, MOE-LSC Shanghai Jiao Tong University, CMA-Shanghai, Shanghai Artificial Intelligence Laboratory