Zero Generalization Error Theorem for Random Interpolators via Algebraic Geometry

📅 2025-12-06

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This paper addresses the fundamental question of why overparameterized neural networks—despite achieving zero training error on sufficiently many samples—still exhibit strong generalization performance, particularly when interpolation solutions are selected randomly. Method: We introduce tools from algebraic geometry to characterize the geometric structure of the solution set (an algebraic variety) in parameter space, analyzing its dimension and irreducible components. Contribution/Results: We rigorously prove that, once the number of training samples exceeds a threshold determined by the intrinsic dimension of the parameter space and the algebraic structure of the model class, the generalization error of a uniformly random interpolator vanishes almost surely. Crucially, this result does not rely on assumptions about optimization dynamics (e.g., SGD’s implicit bias), offering a geometric-probabilistic explanation for the high generalization capability of large models. It establishes that generalization arises fundamentally from the structural properties of the interpolation solution set—not from algorithm-specific inductive biases.

Technology Category

Application Category

📝 Abstract

We theoretically demonstrate that the generalization error of interpolators for machine learning models under teacher-student settings becomes 0 once the number of training samples exceeds a certain threshold. Understanding the high generalization ability of large-scale models such as deep neural networks (DNNs) remains one of the central open problems in machine learning theory. While recent theoretical studies have attributed this phenomenon to the implicit bias of stochastic gradient descent (SGD) toward well-generalizing solutions, empirical evidences indicate that it primarily stems from properties of the model itself. Specifically, even randomly sampled interpolators, which are parameters that achieve zero training error, have been observed to generalize effectively. In this study, under a teacher-student framework, we prove that the generalization error of randomly sampled interpolators becomes exactly zero once the number of training samples exceeds a threshold determined by the geometric structure of the interpolator set in parameter space. As a proof technique, we leverage tools from algebraic geometry to mathematically characterize this geometric structure.

Problem

Research questions and friction points this paper is trying to address.

Proves zero generalization error for random interpolators beyond a sample threshold

Explains high generalization in large models via model properties, not SGD bias

Uses algebraic geometry to characterize geometric structure of interpolator sets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Random interpolators achieve zero generalization error

Algebraic geometry characterizes interpolator set geometry

Generalization threshold depends on training sample count

🔎 Similar Papers

PAC-Chernoff Bounds: Understanding Generalization in the Interpolation Regime