🤖 AI Summary
Existing single-cell RNA-seq clustering methods predominantly rely on threshold-based hard graphs, leading to loss of continuous similarity information and interference from noisy inter-cluster edges—degrading graph neural network performance. To address this, we propose scSGC, a soft-graph clustering framework: (1) it employs a zero-inflated negative binomial autoencoder to learn robust low-dimensional representations; (2) it constructs a non-binary soft graph and introduces dual-channel cut-aware embedding to model fine-grained cell similarities; and (3) it incorporates optimal transport–driven clustering optimization to explicitly suppress cross-cluster information propagation. Evaluated on ten benchmark datasets, scSGC significantly outperforms thirteen state-of-the-art methods in clustering accuracy, cell-type annotation consistency, and computational efficiency. Our results validate the effectiveness of soft-graph modeling and the integration of statistical modeling with graph representation learning.
📝 Abstract
Clustering analysis is fundamental in single-cell RNA sequencing (scRNA-seq) data analysis for elucidating cellular heterogeneity and diversity. Recent graph-based scRNA-seq clustering methods, particularly graph neural networks (GNNs), have significantly improved in tackling the challenges of high-dimension, high-sparsity, and frequent dropout events that lead to ambiguous cell population boundaries. However, their reliance on hard graph constructions derived from thresholded similarity matrices presents challenges:(i) The simplification of intercellular relationships into binary edges (0 or 1) by applying thresholds, which restricts the capture of continuous similarity features among cells and leads to significant information loss.(ii) The presence of significant inter-cluster connections within hard graphs, which can confuse GNN methods that rely heavily on graph structures, potentially causing erroneous message propagation and biased clustering outcomes. To tackle these challenges, we introduce scSGC, a Soft Graph Clustering for single-cell RNA sequencing data, which aims to more accurately characterize continuous similarities among cells through non-binary edge weights, thereby mitigating the limitations of rigid data structures. The scSGC framework comprises three core components: (i) a zero-inflated negative binomial (ZINB)-based feature autoencoder; (ii) a dual-channel cut-informed soft graph embedding module; and (iii) an optimal transport-based clustering optimization module. Extensive experiments across ten datasets demonstrate that scSGC outperforms 13 state-of-the-art clustering models in clustering accuracy, cell type annotation, and computational efficiency. These results highlight its substantial potential to advance scRNA-seq data analysis and deepen our understanding of cellular heterogeneity.