π€ AI Summary
The classical K-SVD algorithm relies on gradient descent with oracle-dependent step sizes and lacks global convergence guarantees. Method: This paper proposes an adaptive step-size gradient descent method that eliminates the need for oracle step sizes, leveraging a preconditioning-inspired step-size strategy, random initialization, and nonconvex optimization analysis. Contribution/Results: We establish, for the first time under generic parameter settings, global linear convergence to the top-k singular values and vectors of a matrix. Our theoretical analysis reveals that, within the region of attraction, the method is equivalent to Heronβs iteration and provides the first global convergence guarantee for general matrices. Empirical evaluations demonstrate that the algorithm is both efficient and robust on large-scale matrices, significantly enhancing the scalability and practicality of k-SVD for ultra-large matrices.
π Abstract
We show that a gradient-descent with a simple, universal rule for step-size selection provably finds $k$-SVD, i.e., the $kgeq 1$ largest singular values and corresponding vectors, of any matrix, despite nonconvexity. There has been substantial progress towards this in the past few years where existing results are able to establish such guarantees for the emph{exact-parameterized} and emph{over-parameterized} settings, with choice of oracle-provided step size. But guarantees for generic setting with a step size selection that does not require oracle-provided information has remained a challenge. We overcome this challenge and establish that gradient descent with an appealingly simple adaptive step size (akin to preconditioning) and random initialization enjoys global linear convergence for generic setting. Our convergence analysis reveals that the gradient method has an attracting region, and within this attracting region, the method behaves like Heron's method (a.k.a. the Babylonian method). Empirically, we validate the theoretical results. The emergence of modern compute infrastructure for iterative optimization coupled with this work is likely to provide means to solve $k$-SVD for very large matrices.