🤖 AI Summary
This work addresses stochastic convex composite optimization under a weak noise assumption—namely, only finite variance of stochastic gradients is assumed, without requiring stronger tail conditions (e.g., sub-Gaussianity). We propose a novel stochastic proximal point method that integrates variance reduction with proximal updates and incorporates an efficient iterative solver for the resulting subproblems. Theoretically, under bounded gradient variance, our method achieves high-probability convergence to an $varepsilon$-accurate solution with $O(1/varepsilon)$ sample complexity—strictly improving upon standard SGD-type algorithms. Our key contributions are threefold: (i) eliminating the need for strong noise assumptions prevalent in prior high-probability analyses; (ii) establishing the first low-sample-complexity, high-probability convergence framework for stochastic composite optimization; and (iii) unifying treatment of nonsmooth problem structure and stochastic gradient error within a single algorithmic and analytical framework.
📝 Abstract
This paper proposes a stochastic proximal point method to solve a stochastic convex composite optimization problem. High probability results in stochastic optimization typically hinge on restrictive assumptions on the stochastic gradient noise, for example, sub-Gaussian distributions. Assuming only weak conditions such as bounded variance of the stochastic gradient, this paper establishes a low sample complexity to obtain a high probability guarantee on the convergence of the proposed method. Additionally, a notable aspect of this work is the development of a subroutine to solve the proximal subproblem, which also serves as a novel technique for variance reduction.