Random Matrix Theory-guided sparse PCA for single-cell RNA-seq data

📅 2025-09-18

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

Single-cell RNA sequencing (scRNA-seq) data suffer from high technical noise, biological heterogeneity, and batch effects, undermining the stability and interpretability of conventional dimensionality reduction methods such as PCA. To address these challenges, we propose SPCA-RMT: a parameter-free framework integrating random matrix theory (RMT)-based eigenvalue selection, dual whitening preprocessing (inspired by Sinkhorn–Knopp scaling), and sparse PCA. Its key innovations include automatic, RMT-guided determination of optimal sparsity—eliminating manual hyperparameter tuning—and joint variance stabilization across both gene and cell dimensions to enhance subspace robustness. Comprehensive evaluation across seven major scRNA-seq protocols and four sparse PCA variants demonstrates that SPCA-RMT consistently outperforms PCA, autoencoders, and diffusion maps, achieving significant improvements in both subspace reconstruction accuracy and cell-type classification performance.

Technology Category

Application Category

📝 Abstract

Single-cell RNA-seq provides detailed molecular snapshots of individual cells but is notoriously noisy. Variability stems from biological differences, PCR amplification bias, limited sequencing depth, and low capture efficiency, making it challenging to adapt computational pipelines to heterogeneous datasets or evolving technologies. As a result, most studies still rely on principal component analysis (PCA) for dimensionality reduction, valued for its interpretability and robustness. Here, we improve upon PCA with a Random Matrix Theory (RMT)-based approach that guides the inference of sparse principal components using existing sparse PCA algorithms. We first introduce a novel biwhitening method, inspired by the Sinkhorn-Knopp algorithm, that simultaneously stabilizes variance across genes and cells. This enables the use of an RMT-based criterion to automatically select the sparsity level, rendering sparse PCA nearly parameter-free. Our mathematically grounded approach retains the interpretability of PCA while enabling robust, hands-off inference of sparse principal components. Across seven single-cell RNA-seq technologies and four sparse PCA algorithms, we show that this method systematically improves the reconstruction of the principal subspace and consistently outperforms PCA-, autoencoder-, and diffusion-based methods in cell-type classification tasks.

Problem

Research questions and friction points this paper is trying to address.

Improving sparse PCA for noisy single-cell RNA-seq data

Automating sparsity selection using Random Matrix Theory

Enhancing cell-type classification across diverse technologies

Innovation

Methods, ideas, or system contributions that make the work stand out.

RMT-guided sparse PCA algorithm

Biwhitening stabilizes gene-cell variance

Automated sparsity selection parameter-free

🔎 Similar Papers

scCDCG: Efficient Deep Structural Clustering for single-cell RNA-seq via Deep Cut-informed Graph Embedding