🤖 AI Summary
This paper addresses low-rank subspace learning under sample-wise heteroscedasticity—i.e., where each data point exhibits a distinct noise variance. We propose a robust dimensionality reduction method that requires no pre-specified subspace dimension, nor assumptions on noise distribution or explicit low-rank component modeling. Methodologically, we introduce an end-to-end joint optimization framework that simultaneously estimates sample-specific noise variances and learns the subspace basis. A key innovation is the adoption of soft rank regularization—replacing rigid rank constraints—to enable automatic determination of the effective rank. We further derive LR-ALPCAH, an efficient matrix decomposition algorithm grounded in variational inference and maximum likelihood estimation. Extensive experiments on synthetic and real-world datasets demonstrate that our approach significantly outperforms PCA, Robust PCA, and other baselines, achieving state-of-the-art performance in both noise variance estimation accuracy and subspace recovery quality.
📝 Abstract
Principal component analysis (PCA) is a key tool in the field of data dimensionality reduction. However, some applications involve heterogeneous data that vary in quality due to noise characteristics associated with each data sample. Heteroscedastic methods aim to deal with such mixed data quality. This paper develops a subspace learning method, named ALPCAH, that can estimate the sample-wise noise variances and use this information to improve the estimate of the subspace basis associated with the low-rank structure of the data. Our method makes no distributional assumptions of the low-rank component and does not assume that the noise variances are known. Further, this method uses a soft rank constraint that does not require subspace dimension to be known. Additionally, this paper develops a matrix factorized version of ALPCAH, named LR-ALPCAH, that is much faster and more memory efficient at the cost of requiring subspace dimension to be known or estimated. Simulations and real data experiments show the effectiveness of accounting for data heteroscedasticity compared to existing algorithms. Code available at https://github.com/javiersc1/ALPCAH.