🤖 AI Summary
Traditional PCA and subspace clustering methods suffer from performance degradation on multi-subspace data exhibiting sample-wise heteroscedastic noise. To address this, we propose a heteroscedastic subspace clustering framework that jointly estimates sample-level noise variances and low-rank subspace bases. This work is the first to incorporate heteroscedastic modeling into Union-of-Subspaces (UoS) clustering, thereby relaxing the restrictive homoscedasticity assumption. Building upon the K-Subspaces framework, we extend LR-ALPCAH by introducing subspace-adaptive weighted low-rank decomposition and an alternating optimization algorithm. Extensive experiments on synthetic and real-world datasets demonstrate that our method significantly outperforms state-of-the-art approaches—including KSS, SSC, and LRR—achieving over 15% improvement in clustering accuracy under highly heteroscedastic noise conditions. Moreover, it enhances robustness and interpretability when handling mixed-quality data.
📝 Abstract
Principal component analysis (PCA) is a key tool in the field of data dimensionality reduction. Various methods have been proposed to extend PCA to the union of subspace (UoS) setting for clustering data that come from multiple subspaces like K-Subspaces (KSS). However, some applications involve heterogeneous data that vary in quality due to noise characteristics associated with each data sample. Heteroscedastic methods aim to deal with such mixed data quality. This paper develops a heteroscedastic-focused subspace clustering method, named ALPCAHUS, that can estimate the sample-wise noise variances and use this information to improve the estimate of the subspace bases associated with the low-rank structure of the data. This clustering algorithm builds on K-Subspaces (KSS) principles by extending the recently proposed heteroscedastic PCA method, named LR-ALPCAH, for clusters with heteroscedastic noise in the UoS setting. Simulations and real-data experiments show the effectiveness of accounting for data heteroscedasticity compared to existing clustering algorithms. Code available at https://github.com/javiersc1/ALPCAHUS.