Harmful Overfitting in Sobolev Spaces

📅 2026-01-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the generalization behavior of norm-minimizing interpolants in Sobolev spaces under noisy data, revealing a persistent and non-vanishing generalization error—referred to as benign overfitting—even in the large-sample regime. By introducing geometric arguments combined with Sobolev inequalities, the analysis is extended for the first time from Hilbert spaces (corresponding to \( p = 2 \)) to general Sobolev spaces with arbitrary \( p \in [1, \infty) \). The study identifies harmful neighborhoods near training points where interpolation amplifies noise. Under assumptions on label noise and data distribution regularity, it is shown that the generalization error of smoothness-preferring interpolants is, with high probability, bounded below by a positive constant. This underscores the critical role of function space selection in determining generalization performance.

Technology Category

Application Category

📝 Abstract
Motivated by recent work on benign overfitting in overparameterized machine learning, we study the generalization behavior of functions in Sobolev spaces $W^{k, p}(\mathbb{R}^d)$ that perfectly fit a noisy training data set. Under assumptions of label noise and sufficient regularity in the data distribution, we show that approximately norm-minimizing interpolators, which are canonical solutions selected by smoothness bias, exhibit harmful overfitting: even as the training sample size $n \to \infty$, the generalization error remains bounded below by a positive constant with high probability. Our results hold for arbitrary values of $p \in [1, \infty)$, in contrast to prior results studying the Hilbert space case ($p = 2$) using kernel methods. Our proof uses a geometric argument which identifies harmful neighborhoods of the training data using Sobolev inequalities.
Problem

Research questions and friction points this paper is trying to address.

harmful overfitting
Sobolev spaces
generalization error
interpolation
label noise
🔎 Similar Papers
No similar papers found.
K
Kedar Karhadkar
Department of Mathematics, University of California, Los Angeles, Los Angeles, CA, USA
A
Alexander Sietsema
Department of Mathematics, University of California, Los Angeles, Los Angeles, CA, USA
Deanna Needell
Deanna Needell
Professor of Mathematics, UCLA
Mathematical signal processingstatisticscompressed sensingnumerical linear algebra
G
Guido Montufar
Department of Mathematics, University of California, Los Angeles, Los Angeles, CA, USA; Department of Statistics & Data Science, University of California, Los Angeles, Los Angeles, CA, USA