Cubic regularized subspace Newton for non-convex optimization

📅 2024-06-24

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This paper addresses the computational bottleneck in high-dimensional nonconvex optimization arising from model overparameterization. We propose the Stochastic Subspace Cubic Newton (SSCN) method, which approximates Hessian information within randomly sampled low-dimensional subspaces. SSCN is the first cubic-regularized Newton method with adaptive subspace dimension selection, supporting inexact curvature estimation and providing a unified convergence analysis. By incorporating an adaptive sampling strategy, it achieves the optimal oracle complexity of $O(varepsilon^{-3/2})$ for reaching a second-order stationary point. The theoretical framework integrates stochastic coordinate sampling, subspace-restricted Newton updates, and nonconvex optimization theory. Empirical evaluations demonstrate that SSCN significantly outperforms state-of-the-art first-order methods—particularly on large-scale, overparameterized machine learning models—while maintaining computational efficiency through subspace acceleration.

Technology Category

Application Category

📝 Abstract

This paper addresses the optimization problem of minimizing non-convex continuous functions, which is relevant in the context of high-dimensional machine learning applications characterized by over-parametrization. We analyze a randomized coordinate second-order method named SSCN which can be interpreted as applying cubic regularization in random subspaces. This approach effectively reduces the computational complexity associated with utilizing second-order information, rendering it applicable in higher-dimensional scenarios. Theoretically, we establish convergence guarantees for non-convex functions, with interpolating rates for arbitrary subspace sizes and allowing inexact curvature estimation. When increasing subspace size, our complexity matches $mathcal{O}(epsilon^{-3/2})$ of the cubic regularization (CR) rate. Additionally, we propose an adaptive sampling scheme ensuring exact convergence rate of $mathcal{O}(epsilon^{-3/2}, epsilon^{-3})$ to a second-order stationary point, even without sampling all coordinates. Experimental results demonstrate substantial speed-ups achieved by SSCN compared to conventional first-order methods.

Problem

Research questions and friction points this paper is trying to address.

Optimizes non-convex functions efficiently

Reduces computational complexity in high dimensions

Ensures convergence with adaptive sampling scheme

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cubic regularization in subspaces

Randomized coordinate second-order method

Adaptive sampling for exact convergence

🔎 Similar Papers

No similar papers found.