🤖 AI Summary
This work addresses stochastic convex optimization in the presence of both additive and multiplicative noise, where conventional methods such as sample average approximation suffer significant performance degradation under finite-sample regimes. We propose VISOR, a novel algorithm that integrates variance reduction with acceleration mechanisms, achieving instance-optimal performance that matches the information-theoretic local minimax lower bound for the first time under finite samples. Through non-asymptotic analysis, we establish the sharpest known instance-dependent generalization error bounds for generalized linear models—including linear regression—while simultaneously attaining optimal sample complexity and oracle complexity.
📝 Abstract
We study the unconstrained minimization of a smooth and strongly convex population loss function under a stochastic oracle that introduces both additive and multiplicative noise; this is a canonical and widely-studied setting that arises across operations research, signal processing, and machine learning. We begin by showing that standard approaches such as sample average approximation and robust (or averaged) stochastic approximation can lead to suboptimal -- and in some cases arbitrarily poor -- performance with realistic finite sample sizes. In contrast, we demonstrate that a carefully designed variance reduction strategy, which we term VISOR for short, can significantly outperform these approaches while using the same sample size. Our upper bounds are complemented by finite-sample, information-theoretic local minimax lower bounds, which highlight fundamental, instance-dependent factors that govern the performance of any estimator. Taken together, these results demonstrate that an accelerated variant of VISOR is instance-optimal, achieving the best possible sample complexity up to logarithmic factors while also attaining optimal oracle complexity. We apply our theory to generalized linear models and improve upon classical results. In particular, we obtain the best-known non-asymptotic, instance-dependent generalization error bounds for stochastic methods, even in linear regression.