Cubic regularized subspace Newton for non-convex optimization

📅 2024-06-24
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the computational bottleneck in high-dimensional nonconvex optimization arising from model overparameterization. We propose the Stochastic Subspace Cubic Newton (SSCN) method, which approximates Hessian information within randomly sampled low-dimensional subspaces. SSCN is the first cubic-regularized Newton method with adaptive subspace dimension selection, supporting inexact curvature estimation and providing a unified convergence analysis. By incorporating an adaptive sampling strategy, it achieves the optimal oracle complexity of $O(varepsilon^{-3/2})$ for reaching a second-order stationary point. The theoretical framework integrates stochastic coordinate sampling, subspace-restricted Newton updates, and nonconvex optimization theory. Empirical evaluations demonstrate that SSCN significantly outperforms state-of-the-art first-order methods—particularly on large-scale, overparameterized machine learning models—while maintaining computational efficiency through subspace acceleration.

Technology Category

Application Category

📝 Abstract
This paper addresses the optimization problem of minimizing non-convex continuous functions, which is relevant in the context of high-dimensional machine learning applications characterized by over-parametrization. We analyze a randomized coordinate second-order method named SSCN which can be interpreted as applying cubic regularization in random subspaces. This approach effectively reduces the computational complexity associated with utilizing second-order information, rendering it applicable in higher-dimensional scenarios. Theoretically, we establish convergence guarantees for non-convex functions, with interpolating rates for arbitrary subspace sizes and allowing inexact curvature estimation. When increasing subspace size, our complexity matches $mathcal{O}(epsilon^{-3/2})$ of the cubic regularization (CR) rate. Additionally, we propose an adaptive sampling scheme ensuring exact convergence rate of $mathcal{O}(epsilon^{-3/2}, epsilon^{-3})$ to a second-order stationary point, even without sampling all coordinates. Experimental results demonstrate substantial speed-ups achieved by SSCN compared to conventional first-order methods.
Problem

Research questions and friction points this paper is trying to address.

Optimizes non-convex functions efficiently
Reduces computational complexity in high dimensions
Ensures convergence with adaptive sampling scheme
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cubic regularization in subspaces
Randomized coordinate second-order method
Adaptive sampling for exact convergence
🔎 Similar Papers
No similar papers found.
Jim Zhao
Jim Zhao
PhD Computer Science, University of Basel
OptimizationMachine Learning
A
Aurélien Lucchi
University of Basel
N
N. Doikov
EPFL