scCDCG: Efficient Deep Structural Clustering for single-cell RNA-seq via Deep Cut-informed Graph Embedding

πŸ“… 2024-04-09
πŸ›οΈ International Conference on Database Systems for Advanced Applications
πŸ“ˆ Citations: 7
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address insufficient structural information utilization, graph neural network (GNN) over-smoothing, and computational inefficiency caused by high dimensionality and sparsity in single-cell RNA-seq (scRNA-seq) clustering, this paper proposes DeepCutβ€”a novel clustering framework integrating deep cut-aware graph embedding, optimal transport-guided self-supervised learning, and a lightweight variational autoencoder (VAE). DeepCut introduces the first deep cut optimization mechanism for graph embedding to mitigate GNN over-smoothing; designs an optimal transport-based self-supervised paradigm tailored to scRNA-seq’s high-dimensional sparsity; and enables synergistic, efficient modeling of cellular expression structure via its three co-designed modules. Evaluated on six real-world scRNA-seq datasets, DeepCut achieves an average 5.2% improvement in clustering accuracy over seven state-of-the-art methods and runs 3.8Γ— faster. The implementation is publicly available.

Technology Category

Application Category

πŸ“ Abstract
Single-cell RNA sequencing (scRNA-seq) is essential for unraveling cellular heterogeneity and diversity, offering invaluable insights for bioinformatics advancements. Despite its potential, traditional clustering methods in scRNA-seq data analysis often neglect the structural information embedded in gene expression profiles, crucial for understanding cellular correlations and dependencies. Existing strategies, including graph neural networks, face challenges in handling the inefficiency due to scRNA-seq data's intrinsic high-dimension and high-sparsity. Addressing these limitations, we introduce scCDCG (single-cell RNA-seq Clustering via Deep Cut-informed Graph), a novel framework designed for efficient and accurate clustering of scRNA-seq data that simultaneously utilizes intercellular high-order structural information. scCDCG comprises three main components: (i) A graph embedding module utilizing deep cut-informed techniques, which effectively captures intercellular high-order structural information, overcoming the over-smoothing and inefficiency issues prevalent in prior graph neural network methods. (ii) A self-supervised learning module guided by optimal transport, tailored to accommodate the unique complexities of scRNA-seq data, specifically its high-dimension and high-sparsity. (iii) An autoencoder-based feature learning module that simplifies model complexity through effective dimension reduction and feature extraction. Our extensive experiments on 6 datasets demonstrate scCDCG's superior performance and efficiency compared to 7 established models, underscoring scCDCG's potential as a transformative tool in scRNA-seq data analysis. Our code is available at: https://github.com/XPgogogo/scCDCG.
Problem

Research questions and friction points this paper is trying to address.

Captures intercellular high-order structural information from scRNA-seq data
Overcomes inefficiency due to high-dimension and high-sparsity of scRNA-seq
Provides efficient and accurate clustering through deep cut-informed graph embedding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep cut-informed graph embedding captures intercellular structural information
Optimal transport guides self-supervised learning for sparse data
Autoencoder reduces dimensions and extracts features efficiently
πŸ”Ž Similar Papers
No similar papers found.
P
Ping Xu
Computer Network Information Center, Chinese Academy of Sciences; University of Chinese Academy of Sciences
Zhiyuan Ning
Zhiyuan Ning
Westlake University
Graph Machine LearningKnowledge GraphsLarge Language Models
M
Meng Xiao
Computer Network Information Center, Chinese Academy of Sciences; University of Chinese Academy of Sciences
G
Guihai Feng
State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences; Institute for Stem Cell and Regenerative Medicine, Chinese Academy of Science; Beijing Institute for Stem Cell and Regenerative Medicine, Chinese Academy of Science
X
Xin Li
State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences; Institute for Stem Cell and Regenerative Medicine, Chinese Academy of Science; Beijing Institute for Stem Cell and Regenerative Medicine, Chinese Academy of Science
Yuanchun Zhou
Yuanchun Zhou
Computer Network Information Center,CAS
Data MiningBig Data Analysis
P
Pengfei Wang
Computer Network Information Center, Chinese Academy of Sciences; University of Chinese Academy of Sciences