🤖 AI Summary
Classical statistical learning theory for neural networks primarily assumes convergence to global optima, failing to explain the strong generalization observed in practice when optimization only reaches stationary points (e.g., local minima or saddle points).
Method: We conduct a high-dimensional statistical analysis of shallow linear and ReLU networks under empirical risk minimization, characterizing gradient stability and approximating local convexity near stationary points.
Contribution/Results: We establish the first statistical guarantees—namely, optimal (up to logarithmic factors) convergence rates—for approximate stationary solutions, i.e., points within neighborhoods of stationary points. Our analysis demonstrates that such solutions achieve statistically optimal generalization without requiring global optimization. This provides rigorous theoretical justification for the empirical observation that “strong generalization does not necessitate global optimality,” thereby substantially bridging the gap between statistical theory and practical deep learning behavior.
📝 Abstract
Since statistical guarantees for neural networks are usually restricted to global optima of intricate objective functions, it is not clear whether these theories really explain the performances of actual outputs of neural-network pipelines. The goal of this paper is, therefore, to bring statistical theory closer to practice. We develop statistical guarantees for simple neural networks that coincide up to logarithmic factors with the global optima but apply to stationary points and the points nearby. These results support the common notion that neural networks do not necessarily need to be optimized globally from a mathematical perspective. More generally, despite being limited to simple neural networks for now, our theories make a step forward in describing the practical properties of neural networks in mathematical terms.