🤖 AI Summary
This work addresses the slow convergence and reliance on function evaluations of stochastic gradient methods in unconstrained nonconvex optimization by proposing a novel algorithm that integrates stochastic variance-reduced gradient (SVRG) with an adaptive stochastic trust-region framework. The method requires only stochastic gradient information—eliminating the need for explicit function value computations—and enhances efficiency through an adaptive trust-region radius adjustment. Notably, it is the first to combine SVRG with an adaptive trust-region mechanism, enabling the incorporation of stochastic (potentially gradient-dependent) second-order information. Theoretical analysis establishes that the algorithm converges in expectation to a first-order stationary point, achieving iteration and sample complexities comparable to state-of-the-art SVRG methods. Empirical results demonstrate its superior performance over SGD and Adam across multiple machine learning tasks.
📝 Abstract
We propose a stochastic trust-region method for unconstrained nonconvex optimization that incorporates stochastic variance-reduced gradients (SVRG) to accelerate convergence. Unlike classical trust-region methods, the proposed algorithm relies solely on stochastic gradient information and does not require function value evaluations. The trust-region radius is adaptively adjusted based on a radius-control parameter and the stochastic gradient estimate. Under mild assumptions, we establish that the algorithm converges in expectation to a first-order stationary point. Moreover, the method achieves iteration and sample complexity bounds that match those of SVRG-based first-order methods, while allowing stochastic and potentially gradient-dependent second-order information. Extensive numerical experiments demonstrate that incorporating SVRG accelerates convergence, and that the use of trust-region methods and Hessian information further improves performance. We also highlight the impact of batch size and inner-loop length on efficiency, and show that the proposed method outperforms SGD and Adam on several machine learning tasks.