🤖 AI Summary
This work investigates the prediction error between the empirical parameter distribution of a two-layer neural network trained via stochastic gradient descent (SGD) and its mean-field limit. By establishing a Talagrand-type transport inequality along the SGD trajectory, the authors obtain, for the first time, an explicit constant independent of the number of iterations. They prove time-uniform and dimension-free concentration of the empirical measure around the mean-field limit in both Wasserstein-1 (W₁) and sliced Wasserstein-1 (SW₁) distances. This concentration result yields high-probability bounds on the prediction error for fixed test functions, achieving a time-uniform and dimension-independent convergence rate. The analysis provides new non-asymptotic guarantees for SGD under the mean-field framework.
📝 Abstract
We quantify, uniformly over time and with high probability, the discrepancy between the predictions of a two-layer neural network trained by stochastic gradient descent (SGD) and their mean-field limit, for quadratic loss and ridge regularization. As a key ingredient, we establish T p transportation inequalities (p $\in$ {1, 2}) for the law of the SGD parameters, with explicit constants independent of the iteration index. We then prove uniform-in-time concentration of the empirical parameter measure around its mean-field limit in the Wasserstein distance W 1 , and we translate these bounds into prediction-error estimates against a fixed test function $Φ$. We also derive analogous concentration bounds in the sliced-Wasserstein distance SW 1 , leading to dimension-free rates.