🤖 AI Summary
This paper addresses the fundamental question of why overparameterized neural networks—despite achieving zero training error on sufficiently many samples—still exhibit strong generalization performance, particularly when interpolation solutions are selected randomly.
Method: We introduce tools from algebraic geometry to characterize the geometric structure of the solution set (an algebraic variety) in parameter space, analyzing its dimension and irreducible components.
Contribution/Results: We rigorously prove that, once the number of training samples exceeds a threshold determined by the intrinsic dimension of the parameter space and the algebraic structure of the model class, the generalization error of a uniformly random interpolator vanishes almost surely. Crucially, this result does not rely on assumptions about optimization dynamics (e.g., SGD’s implicit bias), offering a geometric-probabilistic explanation for the high generalization capability of large models. It establishes that generalization arises fundamentally from the structural properties of the interpolation solution set—not from algorithm-specific inductive biases.
📝 Abstract
We theoretically demonstrate that the generalization error of interpolators for machine learning models under teacher-student settings becomes 0 once the number of training samples exceeds a certain threshold. Understanding the high generalization ability of large-scale models such as deep neural networks (DNNs) remains one of the central open problems in machine learning theory. While recent theoretical studies have attributed this phenomenon to the implicit bias of stochastic gradient descent (SGD) toward well-generalizing solutions, empirical evidences indicate that it primarily stems from properties of the model itself. Specifically, even randomly sampled interpolators, which are parameters that achieve zero training error, have been observed to generalize effectively. In this study, under a teacher-student framework, we prove that the generalization error of randomly sampled interpolators becomes exactly zero once the number of training samples exceeds a threshold determined by the geometric structure of the interpolator set in parameter space. As a proof technique, we leverage tools from algebraic geometry to mathematically characterize this geometric structure.