🤖 AI Summary
Unsupervised cell type identification faces significant challenges in distinguishing closely related cell types, primarily due to the neglect of biologically meaningful cell–gene associations in existing methods. To address this, we propose scRCL—a novel framework that jointly integrates dual-contrast distribution alignment with a gene-correlation-driven representation refinement module. scRCL explicitly models the synergistic cell–gene structural relationships and further incorporates gene co-expression priors with graph-augmented embedding learning. Evaluated on multiple benchmark single-cell RNA-seq and spatial transcriptomics datasets, scRCL consistently outperforms state-of-the-art methods, achieving absolute improvements of 3.2–9.7% in clustering accuracy. Moreover, the identified cell clusters exhibit strong biological coherence and yield interpretable, biologically meaningful gene expression signatures.
📝 Abstract
Unsupervised cell type identification is crucial for uncovering and characterizing heterogeneous populations in single cell omics studies. Although a range of clustering methods have been developed, most focus exclusively on intrinsic cellular structure and ignore the pivotal role of cell-gene associations, which limits their ability to distinguish closely related cell types. To this end, we propose a Refinement Contrastive Learning framework (scRCL) that explicitly incorporates cell-gene interactions to derive more informative representations. Specifically, we introduce two contrastive distribution alignment components that reveal reliable intrinsic cellular structures by effectively exploiting cell-cell structural relationships. Additionally, we develop a refinement module that integrates gene-correlation structure learning to enhance cell embeddings by capturing underlying cell-gene associations. This module strengthens connections between cells and their associated genes, refining the representation learning to exploiting biologically meaningful relationships. Extensive experiments on several single-cell RNA-seq and spatial transcriptomics benchmark datasets demonstrate that our method consistently outperforms state-of-the-art baselines in cell-type identification accuracy. Moreover, downstream biological analyses confirm that the recovered cell populations exhibit coherent gene-expression signatures, further validating the biological relevance of our approach. The code is available at https://github.com/THPengL/scRCL.