🤖 AI Summary
This work addresses the challenge of quantifying uncertainty for individual components of eigenvectors in streaming principal component analysis (PCA), a problem previously lacking rigorous statistical guarantees. We propose the first coordinate-wise confidence intervals for the leading eigenvector estimated by Oja’s algorithm—departing from conventional global error metrics such as sin²θ distance. Methodologically, we derive the first sharp Bernstein-type concentration inequality and coordinate-level central limit theorem for eigenvector components under streaming PCA. To enable practical inference, we introduce a median-of-means subsampling variance estimator that ensures both theoretical consistency and computational efficiency. Experiments on high-dimensional streaming data demonstrate accurate coverage rates and strong robustness of the proposed confidence intervals, with runtime only 20–33% that of multiplier bootstrap—a substantial improvement in scalability and interpretability for online statistical inference.
📝 Abstract
We propose a novel statistical inference framework for streaming principal component analysis (PCA) using Oja's algorithm, enabling the construction of confidence intervals for individual entries of the estimated eigenvector. Most existing works on streaming PCA focus on providing sharp sin-squared error guarantees. Recently, there has been some interest in uncertainty quantification for the sin-squared error. However, uncertainty quantification or sharp error guarantees for entries of the estimated eigenvector in the streaming setting remains largely unexplored. We derive a sharp Bernstein-type concentration bound for elements of the estimated vector matching the optimal error rate up to logarithmic factors. We also establish a Central Limit Theorem for a suitably centered and scaled subset of the entries. To efficiently estimate the coordinate-wise variance, we introduce a provably consistent subsampling algorithm that leverages the median-of-means approach, empirically achieving similar accuracy to multiplier bootstrap methods while being significantly more computationally efficient. Numerical experiments demonstrate its effectiveness in providing reliable uncertainty estimates with a fraction of the computational cost of existing methods.