🤖 AI Summary
This paper addresses the challenge of efficiently training nonparametric additive models (PAMs) for nonparametric regression under limited memory and computational resources. We propose an iterative algorithm based on stochastic gradient descent (SGD) in function space, representing each additive component via truncated basis expansions and employing a three-stage adaptive learning rate schedule for robust optimization. Theoretically, we establish the first SGD convergence framework for nonparametric regression with model misspecification tolerance: under correct specification, the estimator achieves the minimax-optimal rate $n^{-2s/(2s+1)}$, where $s$ denotes the smoothness order; under misspecification, it satisfies an oracle inequality. Crucially, our results hold uniformly across arbitrary dimensionality and sample size $n$, and retain polynomial convergence rates even when the covariate support is not full.
📝 Abstract
This paper introduces an iterative algorithm for training nonparametric additive models that enjoys favorable memory storage and computational requirements. The algorithm can be viewed as the functional counterpart of stochastic gradient descent, applied to the coefficients of a truncated basis expansion of the component functions. We show that the resulting estimator satisfies an oracle inequality that allows for model mis-specification. In the well-specified setting, by choosing the learning rate carefully across three distinct stages of training, we demonstrate that its risk is minimax optimal in terms of the dependence on the dimensionality of the data and the size of the training sample. We also provide polynomial convergence rates even when the covariates do not have full support on their domain.