🤖 AI Summary
This work addresses generalized linear prediction under a single-pass streaming setting with non-quadratic and inexact modeling assumptions. It introduces, for the first time, a momentum mechanism into stochastic gradient descent, proposing a data-dependent proximal optimization method that incorporates dual momentum to achieve acceleration. The proposed approach resolves an open problem posed by Jain et al., establishing a refined excess risk bound that decomposes into optimization error, minimax statistical error, and higher-order model misspecification error. This is accomplished through fine-grained smoothness analysis and a two-stage outer-loop statistical error characterization. Both theoretical analysis and empirical evaluation demonstrate that the momentum-based acceleration outperforms existing variance-reduction methods.
📝 Abstract
We study generalized linear prediction under a streaming setting, where each iteration uses only one fresh data point for a gradient-level update. While momentum is well-established in deterministic optimization, a fundamental open question is whether it can accelerate such single-pass non-quadratic stochastic optimization. We propose the first algorithm that successfully incorporates momentum via a novel data-dependent proximal method, achieving dual-momentum acceleration. Our derived excess risk bound decomposes into three components: an improved optimization error, a minimax optimal statistical error, and a higher-order model-misspecification error. The proof handles mis-specification via a fine-grained stationary analysis of inner updates, while localizing statistical error through a two-phase outer-loop analysis. As a result, we resolve the open problem posed by Jain et al. [2018a] and demonstrate that momentum acceleration is more effective than variance reduction for generalized linear prediction in the streaming setting.