🤖 AI Summary
This work addresses the theoretical gap concerning the interplay among over-parameterization, stability, and generalization in discontinuous classifiers by introducing two quantifiable robustness measures—class stability and normalized co-stability—that do not rely on function smoothness assumptions. Building on the expected margin from inputs to decision boundaries, the authors derive generalization bounds for both finite and infinite hypothesis classes, and integrate function complexity with interpolation-based analysis to reveal that high stability necessitates substantial over-parameterization (p ≫ n). Theoretically, they prove that interpolating solutions are inherently unstable when the number of parameters p is comparable to the sample size n. Empirical results further demonstrate that increasing model size enhances both stability and test performance—a phenomenon invisible to conventional norm-based metrics.
📝 Abstract
The relationship between overparameterization, stability, and generalization remains incompletely understood in the setting of discontinuous classifiers. We address this gap by establishing a generalization bound for finite function classes that improves inversely with class stability, defined as the expected distance to the decision boundary in the input domain (margin). Interpreting class stability as a quantifiable notion of robustness, we derive as a corollary a law of robustness for classification that extends the results of Bubeck and Sellke beyond smoothness assumptions to discontinuous functions. In particular, any interpolating model with $p \approx n$ parameters on $n$ data points must be unstable, implying that substantial overparameterization is necessary to achieve high stability. We obtain analogous results for parameterized infinite function classes by analyzing a stronger robustness measure derived from the margin in the codomain, which we refer to as the normalized co-stability. Experiments support our theory: stability increases with model size and correlates with test performance, while traditional norm-based measures remain largely uninformative.