🤖 AI Summary
To address the prohibitively high computational cost of Riemannian retractions in large-scale optimization under orthogonality constraints, this paper proposes Random Submanifold Optimization (RSMO). At each iteration, RSMO performs tangent-space search and employs a simplified retraction exclusively on a low-dimensional random submanifold, drastically reducing per-iteration complexity. The method supports two efficient sampling strategies and, for the first time, establishes convergence guarantees under non-convex, Riemannian Polyak–Łojasiewicz (PL), and stochastic settings. Moreover, it naturally extends to quotient manifolds induced by orthogonal groups. Experiments on large-scale tasks—including matrix completion, principal component analysis (PCA), and orthogonalization of Transformer weights—demonstrate that RSMO achieves 2–5× speedup over state-of-the-art methods while preserving solution accuracy, thereby significantly enhancing the scalability of orthogonally constrained optimization.
📝 Abstract
Optimization with orthogonality constraints frequently arises in various fields such as machine learning. Riemannian optimization offers a powerful framework for solving these problems by equipping the constraint set with a Riemannian manifold structure and performing optimization intrinsically on the manifold. This approach typically involves computing a search direction in the tangent space and updating variables via a retraction operation. However, as the size of the variables increases, the computational cost of the retraction can become prohibitively high, limiting the applicability of Riemannian optimization to large-scale problems. To address this challenge and enhance scalability, we propose a novel approach that restricts each update on a random submanifold, thereby significantly reducing the per-iteration complexity. We introduce two sampling strategies for selecting the random submanifolds and theoretically analyze the convergence of the proposed methods. We provide convergence results for general nonconvex functions and functions that satisfy Riemannian Polyak-Lojasiewicz condition as well as for stochastic optimization settings. Additionally, we demonstrate how our approach can be generalized to quotient manifolds derived from the orthogonal manifold. Extensive experiments verify the benefits of the proposed method, across a wide variety of problems.