🤖 AI Summary
This paper addresses the computational bottleneck in high-dimensional nonconvex optimization arising from model overparameterization. We propose the Stochastic Subspace Cubic Newton (SSCN) method, which approximates Hessian information within randomly sampled low-dimensional subspaces. SSCN is the first cubic-regularized Newton method with adaptive subspace dimension selection, supporting inexact curvature estimation and providing a unified convergence analysis. By incorporating an adaptive sampling strategy, it achieves the optimal oracle complexity of $O(varepsilon^{-3/2})$ for reaching a second-order stationary point. The theoretical framework integrates stochastic coordinate sampling, subspace-restricted Newton updates, and nonconvex optimization theory. Empirical evaluations demonstrate that SSCN significantly outperforms state-of-the-art first-order methods—particularly on large-scale, overparameterized machine learning models—while maintaining computational efficiency through subspace acceleration.
📝 Abstract
This paper addresses the optimization problem of minimizing non-convex continuous functions, which is relevant in the context of high-dimensional machine learning applications characterized by over-parametrization. We analyze a randomized coordinate second-order method named SSCN which can be interpreted as applying cubic regularization in random subspaces. This approach effectively reduces the computational complexity associated with utilizing second-order information, rendering it applicable in higher-dimensional scenarios. Theoretically, we establish convergence guarantees for non-convex functions, with interpolating rates for arbitrary subspace sizes and allowing inexact curvature estimation. When increasing subspace size, our complexity matches $mathcal{O}(epsilon^{-3/2})$ of the cubic regularization (CR) rate. Additionally, we propose an adaptive sampling scheme ensuring exact convergence rate of $mathcal{O}(epsilon^{-3/2}, epsilon^{-3})$ to a second-order stationary point, even without sampling all coordinates. Experimental results demonstrate substantial speed-ups achieved by SSCN compared to conventional first-order methods.