🤖 AI Summary
To address the trade-off between classification accuracy and computational efficiency in single-cell RNA sequencing (scRNA-seq) data, this paper proposes EnProCell—a unified low-dimensional projection framework. EnProCell jointly optimizes principal component analysis (PCA) for expression variance preservation and multi-discriminant analysis (MDA) for inter-class separability, yielding a discriminative yet information-rich low-dimensional embedding space; a lightweight deep neural network is then trained in this space for efficient cell-type classification. Its key innovation is the first zero-shot transfer prediction for unlabeled query samples—requiring neither fine-tuning nor additional annotations. Evaluated on four major scRNA-seq platforms, EnProCell achieves state-of-the-art performance with significantly reduced computational overhead: 98.91% accuracy (F1 = 98.64%) on reference data and 99.52% accuracy (F1 = 99.07%) on unseen query data.
📝 Abstract
Single-cell RNA sequencing (scRNA-seq) enables the study of cellular diversity at single cell level. It provides a global view of cell-type specification during the onset of biological mechanisms such as developmental processes and human organogenesis. Various statistical, machine and deep learning-based methods have been proposed for cell-type classification. Most of the methods utilizes unsupervised lower dimensional projections obtained from for a large reference data. In this work, we proposed a reference-based method for cell type classification, called EnProCell. The EnProCell, first, computes lower dimensional projections that capture both the high variance and class separability through an ensemble of principle component analysis and multiple discriminant analysis. In the second phase, EnProCell trains a deep neural network on the lower dimensional representation of data to classify cell types. The proposed method outperformed the existing state-of-the-art methods when tested on four different data sets produced from different single-cell sequencing technologies. The EnProCell showed higher accuracy (98.91) and F1 score (98.64) than other methods for predicting reference from reference datasets. Similarly, EnProCell also showed better performance than existing methods in predicting cell types for data with unknown cell types (query) from reference datasets (accuracy:99.52; F1 score: 99.07). In addition to improved performance, the proposed methodology is simple and does not require more computational resources and time. the EnProCell is available at https://github.com/umar1196/EnProCell.