Uniform-in-time concentration in two-layer neural networks via transportation inequalities

📅 2026-03-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the prediction error between the empirical parameter distribution of a two-layer neural network trained via stochastic gradient descent (SGD) and its mean-field limit. By establishing a Talagrand-type transport inequality along the SGD trajectory, the authors obtain, for the first time, an explicit constant independent of the number of iterations. They prove time-uniform and dimension-free concentration of the empirical measure around the mean-field limit in both Wasserstein-1 (W₁) and sliced Wasserstein-1 (SW₁) distances. This concentration result yields high-probability bounds on the prediction error for fixed test functions, achieving a time-uniform and dimension-independent convergence rate. The analysis provides new non-asymptotic guarantees for SGD under the mean-field framework.

Technology Category

Application Category

📝 Abstract
We quantify, uniformly over time and with high probability, the discrepancy between the predictions of a two-layer neural network trained by stochastic gradient descent (SGD) and their mean-field limit, for quadratic loss and ridge regularization. As a key ingredient, we establish T p transportation inequalities (p $\in$ {1, 2}) for the law of the SGD parameters, with explicit constants independent of the iteration index. We then prove uniform-in-time concentration of the empirical parameter measure around its mean-field limit in the Wasserstein distance W 1 , and we translate these bounds into prediction-error estimates against a fixed test function $Φ$. We also derive analogous concentration bounds in the sliced-Wasserstein distance SW 1 , leading to dimension-free rates.
Problem

Research questions and friction points this paper is trying to address.

two-layer neural networks
mean-field limit
uniform-in-time concentration
stochastic gradient descent
Wasserstein distance
Innovation

Methods, ideas, or system contributions that make the work stand out.

transportation inequalities
uniform-in-time concentration
mean-field limit
Wasserstein distance
two-layer neural networks
🔎 Similar Papers
2024-08-19arXiv.orgCitations: 1
Arnaud Guillin
Arnaud Guillin
Professeur de Mathématiques, Université Clermont-Auvergne
ProbabilitésStatistiques
B
Boris Nectoux
Université Clermont-Auvergne, CNRS UMR 6620, LMBP, Clermont-Ferrand, France
P
Paul Stos
Université Clermont-Auvergne, CNRS UMR 6620, LMBP, Clermont-Ferrand, France