Stochastic Gradient Descent for Nonparametric Regression

📅 2024-01-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of efficiently training nonparametric additive models (PAMs) for nonparametric regression under limited memory and computational resources. We propose an iterative algorithm based on stochastic gradient descent (SGD) in function space, representing each additive component via truncated basis expansions and employing a three-stage adaptive learning rate schedule for robust optimization. Theoretically, we establish the first SGD convergence framework for nonparametric regression with model misspecification tolerance: under correct specification, the estimator achieves the minimax-optimal rate $n^{-2s/(2s+1)}$, where $s$ denotes the smoothness order; under misspecification, it satisfies an oracle inequality. Crucially, our results hold uniformly across arbitrary dimensionality and sample size $n$, and retain polynomial convergence rates even when the covariate support is not full.

Technology Category

Application Category

📝 Abstract
This paper introduces an iterative algorithm for training nonparametric additive models that enjoys favorable memory storage and computational requirements. The algorithm can be viewed as the functional counterpart of stochastic gradient descent, applied to the coefficients of a truncated basis expansion of the component functions. We show that the resulting estimator satisfies an oracle inequality that allows for model mis-specification. In the well-specified setting, by choosing the learning rate carefully across three distinct stages of training, we demonstrate that its risk is minimax optimal in terms of the dependence on the dimensionality of the data and the size of the training sample. We also provide polynomial convergence rates even when the covariates do not have full support on their domain.
Problem

Research questions and friction points this paper is trying to address.

Develops stochastic gradient descent for nonparametric regression models
Ensures minimax optimal risk with careful learning rate stages
Provides polynomial convergence without full covariate support
Innovation

Methods, ideas, or system contributions that make the work stand out.

Functional stochastic gradient descent algorithm
Truncated basis expansion for components
Minimax optimal risk via staged learning
🔎 Similar Papers
No similar papers found.