🤖 AI Summary
This paper addresses the optimal estimation problem in streaming sparse principal component analysis (Sparse PCA) under single-pass, $O(d)$-space, and $O(nd)$-time constraints. Methodologically, it proposes the first single-pass algorithm that requires neither strong initialization nor structural assumptions on the covariance matrix (e.g., spikedness), by introducing hard thresholding into the Oja iteration to construct a thresholded Oja vector. Theoretically, it establishes a novel analytical framework based on projected products of random matrices for unnormalized Oja iterates, integrated with effective rank theory to characterize convergence. Under standard sub-Gaussian sampling and sparsity assumptions, the algorithm achieves the minimax-optimal $sin^2$-error rate $O(s log d / n)$, providing the first tight statistical optimality guarantee for any single-pass sparse PCA algorithm.
📝 Abstract
Oja's algorithm for Streaming Principal Component Analysis (PCA) for $n$ data-points in a $d$ dimensional space achieves the same sin-squared error $O(r_{mathsf{eff}}/n)$ as the offline algorithm in $O(d)$ space and $O(nd)$ time and a single pass through the datapoints. Here $r_{mathsf{eff}}$ is the effective rank (ratio of the trace and the principal eigenvalue of the population covariance matrix $Sigma$). Under this computational budget, we consider the problem of sparse PCA, where the principal eigenvector of $Sigma$ is $s$-sparse, and $r_{mathsf{eff}}$ can be large. In this setting, to our knowledge, extit{there are no known single-pass algorithms} that achieve the minimax error bound in $O(d)$ space and $O(nd)$ time without either requiring strong initialization conditions or assuming further structure (e.g., spiked) of the covariance matrix. We show that a simple single-pass procedure that thresholds the output of Oja's algorithm (the Oja vector) can achieve the minimax error bound under some regularity conditions in $O(d)$ space and $O(nd)$ time. We present a nontrivial and novel analysis of the entries of the unnormalized Oja vector, which involves the projection of a product of independent random matrices on a random initial vector. This is completely different from previous analyses of Oja's algorithm and matrix products, which have been done when the $r_{mathsf{eff}}$ is bounded.