Precise asymptotic analysis of Sobolev training for random feature models

📅 2025-11-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the impact of Sobolev training—simultaneous fitting of function values and gradients—on the generalization error of random feature models in the high-dimensional overparameterized regime. Under the asymptotic limit where the number of parameters, input dimension, and sample size all scale proportionally to infinity, we integrate statistical physics replica methods, operator-valued free probability linearization, and high-dimensional random matrix analysis to derive, for the first time, a closed-form exact characterization of the generalization error under Sobolev training. Our theoretical analysis reveals that gradient supervision does not universally improve performance: it yields benefits only within a specific overparameterization interval; both insufficient and excessive overparameterization render gradient information ineffective. The work identifies the optimal operational window for noise interpolation in such models and establishes fundamental theoretical criteria and design boundaries for gradient-augmented learning.

Technology Category

Application Category

📝 Abstract
Gradient information is widely useful and available in applications, and is therefore natural to include in the training of neural networks. Yet little is known theoretically about the impact of Sobolev training -- regression with both function and gradient data -- on the generalization error of highly overparameterized predictive models in high dimensions. In this paper, we obtain a precise characterization of this training modality for random feature (RF) models in the limit where the number of trainable parameters, input dimensions, and training data tend proportionally to infinity. Our model for Sobolev training reflects practical implementations by sketching gradient data onto finite dimensional subspaces. By combining the replica method from statistical physics with linearizations in operator-valued free probability theory, we derive a closed-form description for the generalization errors of the trained RF models. For target functions described by single-index models, we demonstrate that supplementing function data with additional gradient data does not universally improve predictive performance. Rather, the degree of overparameterization should inform the choice of training method. More broadly, our results identify settings where models perform optimally by interpolating noisy function and gradient data.
Problem

Research questions and friction points this paper is trying to address.

Analyzes Sobolev training impact on overparameterized models' generalization error
Studies gradient-augmented regression for random feature models in high dimensions
Determines when gradient data improves prediction in overparameterized regimes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sobolev training with sketched gradient data
Replica method combined with free probability theory
Closed-form generalization error for overparameterized models
🔎 Similar Papers
No similar papers found.
K
Katharine Fisher
Center for Computational Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
M
Matthew T.C. Li
Department of Mathematics and Statistics, University of Massachusetts Amherst, Amherst, MA 01003, USA
Youssef Marzouk
Youssef Marzouk
Professor, Massachusetts Institute of Technology
computational mathematicsuncertainty quantificationinverse problemsdata assimilationBayesian statistics
T
Timo Schorlepp
Courant Institute of Mathematical Sciences, New York University, New York, NY 10012, USA