🤖 AI Summary
What is the convergence rate of the generalization error of deep ReLU feedforward networks with respect to the sample size $n$?
Method: We combine Rademacher complexity theory, parameter sensitivity modeling, and large-scale empirical validation across diverse datasets.
Contribution/Results: We provide the first rigorous theoretical and empirical demonstration that the generalization error converges at rate $1/sqrt{n}$—not the classical $1/n$ rate typical of parametric models—thereby revealing the fundamental origin of deep networks’ “data hunger.” This scaling law is consistently validated across standard benchmarks including CIFAR-10, CIFAR-100, and ImageNet. Our work establishes the first quantitative theoretical characterization—and corresponding empirical confirmation—of the data-efficiency boundary for deep learning, offering foundational insights into the statistical limitations of overparameterized neural networks.
📝 Abstract
Neural networks have become standard tools in many areas, yet many important statistical questions remain open. This paper studies the question of how much data are needed to train a ReLU feed-forward neural network. Our theoretical and empirical results suggest that the generalization error of ReLU feed-forward neural networks scales at the rate $1/sqrt{n}$ in the sample size $n$ rather than the usual"parametric rate"$1/n$. Thus, broadly speaking, our results underpin the common belief that neural networks need"many"training samples.