๐ค AI Summary
Existing methods for online nonparametric estimation on streaming data lack efficient, adaptive hyperparameter selection mechanisms.
Method: We propose Weighted Rolling Validation (WRV), a low-overhead, fully online model selection framework that generalizes leave-one-out cross-validation to the streaming setting via temporal weighting of historical validation samples. Grounded in statistical stability assumptions, WRV dynamically assigns time-decaying weights without requiring additional storage or retraining, and is compatible with stochastic gradientโbased nonparametric estimators.
Contribution/Results: We establish theoretical guarantees showing WRV achieves adaptive convergence rates. Empirically, WRV exhibits high sensitivity to subtle performance differences among candidate estimators, incurs negligible computational overhead, and significantly improves prediction accuracy and robustness. To our knowledge, WRV is the first lightweight, theoretically grounded hyperparameter adaptation mechanism for online nonparametric learning.
๐ Abstract
Online nonparametric estimators are gaining popularity due to their efficient computation and competitive generalization abilities. An important example includes variants of stochastic gradient descent. These algorithms often take one sample point at a time and incrementally update the parameter estimate of interest. In this work, we consider model selection/hyperparameter tuning for such online algorithms. We propose a weighted rolling validation procedure, an online variant of leave-one-out cross-validation, that costs minimal extra computation for many typical stochastic gradient descent estimators and maintains their online nature. Similar to batch cross-validation, it can boost base estimators to achieve better heuristic performance and adaptive convergence rate. Our analysis is straightforward, relying mainly on some general statistical stability assumptions. The simulation study underscores the significance of diverging weights in practice and demonstrates its favorable sensitivity even when there is only a slim difference between candidate estimators.