Beyond Sin-Squared Error: Linear-Time Entrywise Uncertainty Quantification for Streaming PCA

📅 2025-06-14

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This work addresses the challenge of quantifying uncertainty for individual components of eigenvectors in streaming principal component analysis (PCA), a problem previously lacking rigorous statistical guarantees. We propose the first coordinate-wise confidence intervals for the leading eigenvector estimated by Oja’s algorithm—departing from conventional global error metrics such as sin²θ distance. Methodologically, we derive the first sharp Bernstein-type concentration inequality and coordinate-level central limit theorem for eigenvector components under streaming PCA. To enable practical inference, we introduce a median-of-means subsampling variance estimator that ensures both theoretical consistency and computational efficiency. Experiments on high-dimensional streaming data demonstrate accurate coverage rates and strong robustness of the proposed confidence intervals, with runtime only 20–33% that of multiplier bootstrap—a substantial improvement in scalability and interpretability for online statistical inference.

Technology Category

Application Category

📝 Abstract

We propose a novel statistical inference framework for streaming principal component analysis (PCA) using Oja's algorithm, enabling the construction of confidence intervals for individual entries of the estimated eigenvector. Most existing works on streaming PCA focus on providing sharp sin-squared error guarantees. Recently, there has been some interest in uncertainty quantification for the sin-squared error. However, uncertainty quantification or sharp error guarantees for entries of the estimated eigenvector in the streaming setting remains largely unexplored. We derive a sharp Bernstein-type concentration bound for elements of the estimated vector matching the optimal error rate up to logarithmic factors. We also establish a Central Limit Theorem for a suitably centered and scaled subset of the entries. To efficiently estimate the coordinate-wise variance, we introduce a provably consistent subsampling algorithm that leverages the median-of-means approach, empirically achieving similar accuracy to multiplier bootstrap methods while being significantly more computationally efficient. Numerical experiments demonstrate its effectiveness in providing reliable uncertainty estimates with a fraction of the computational cost of existing methods.

Problem

Research questions and friction points this paper is trying to address.

Streaming PCA lacks entrywise eigenvector uncertainty quantification

Existing methods focus on sin-squared error, not entrywise guarantees

Need efficient variance estimation for streaming PCA entries

Innovation

Methods, ideas, or system contributions that make the work stand out.

Streaming PCA with Oja's algorithm

Sharp Bernstein-type concentration bound

Consistent subsampling using median-of-means

🔎 Similar Papers

Oja's Algorithm for Streaming Sparse PCA