Enhanced Single-Cell RNA-seq Embedding through Gene Expression and Data-Driven Gene-Gene Interaction Integration

📅 2025-09-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High-dimensional sparsity and technical noise in single-cell RNA-seq (scRNA-seq) data impede accurate cell-state representation, while most existing embedding methods neglect gene regulatory interactions. To address this, we propose the Enhanced Cell–Leaf Graph (ECLG), the first method to jointly integrate a gene regulatory network—learned via random forest—with a K-nearest-neighbor cell similarity graph, forming a multi-source collaborative graph structure. ECLG then employs a graph neural network to jointly encode both gene expression profiles and regulatory interaction information. Evaluated on multiple benchmark datasets, ECLG consistently outperforms state-of-the-art tools—including Scanpy, Seurat, and scVI—across key tasks: rare cell-type identification, clustering accuracy, visualization separation, and pseudotemporal ordering inference. By explicitly modeling regulatory logic within a geometric deep learning framework, ECLG establishes a novel, interpretable, and robust paradigm for functional single-cell analysis.

Technology Category

Application Category

📝 Abstract
Single-cell RNA sequencing (scRNA-seq) provides unprecedented insights into cellular heterogeneity, enabling detailed analysis of complex biological systems at single-cell resolution. However, the high dimensionality and technical noise inherent in scRNA-seq data pose significant analytical challenges. While current embedding methods focus primarily on gene expression levels, they often overlook crucial gene-gene interactions that govern cellular identity and function. To address this limitation, we present a novel embedding approach that integrates both gene expression profiles and data-driven gene-gene interactions. Our method first constructs a Cell-Leaf Graph (CLG) using random forest models to capture regulatory relationships between genes, while simultaneously building a K-Nearest Neighbor Graph (KNNG) to represent expression similarities between cells. These graphs are then combined into an Enriched Cell-Leaf Graph (ECLG), which serves as input for a graph neural network to compute cell embeddings. By incorporating both expression levels and gene-gene interactions, our approach provides a more comprehensive representation of cellular states. Extensive evaluation across multiple datasets demonstrates that our method enhances the detection of rare cell populations and improves downstream analyses such as visualization, clustering, and trajectory inference. This integrated approach represents a significant advance in single-cell data analysis, offering a more complete framework for understanding cellular diversity and dynamics.
Problem

Research questions and friction points this paper is trying to address.

Integrating gene expression and interactions for scRNA-seq embedding
Overcoming high dimensionality and noise in single-cell data
Improving rare cell detection and downstream analysis accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates gene expression and data-driven gene-gene interactions
Constructs Enriched Cell-Leaf Graph using random forest models
Uses graph neural network for comprehensive cell embeddings
🔎 Similar Papers
No similar papers found.
H
Hojjat Torabi Goudarzi
School of Electrical Engineering and Computer Science, Oregon State University, Address one, Corvallis, 97331, Oregon, United States
Maziyar Baran Pouyan
Maziyar Baran Pouyan
Accenture Bioinformatic Lab
BioinformaticsMachine learningHigh Throughput Sequencing Data AnalysisSingle-Cell Data AnalysisPhysiological Signal Proc