scCluBench: Comprehensive Benchmarking of Clustering Algorithms for Single-Cell RNA Sequencing

📅 2025-12-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current single-cell RNA-seq (scRNA-seq) clustering methods lack a standardized benchmark, resulting in fragmented evaluation practices and insufficient integration of recent advances in artificial intelligence. Method: We establish the first standardized, multi-algorithm benchmarking platform, systematically evaluating 82 clustering methods—including classical algorithms, deep learning models, graph neural networks, and biology-informed foundation models—across 36 real-world scRNA-seq datasets. Evaluation employs quantitative metrics (ARI, NMI), visualization (t-SNE, UMAP), marker gene identification, cell-type annotation, and rigorous assessment of robustness, scalability, and biological interpretability. Contribution/Results: Our analysis delineates the applicability boundaries and generalization capabilities of diverse clustering paradigms, enabling evidence-based, transparent method selection. We release an open-source, fully reproducible framework with curated datasets, preprocessing pipelines, and evaluation scripts to advance rigor and standardization in scRNA-seq clustering research.

Technology Category

Application Category

📝 Abstract
Cell clustering is crucial for uncovering cellular heterogeneity in single-cell RNA sequencing (scRNA-seq) data by identifying cell types and marker genes. Despite its importance, benchmarks for scRNA-seq clustering methods remain fragmented, often lacking standardized protocols and failing to incorporate recent advances in artificial intelligence. To fill these gaps, we present scCluBench, a comprehensive benchmark of clustering algorithms for scRNA-seq data. First, scCluBench provides 36 scRNA-seq datasets collected from diverse public sources, covering multiple tissues, which are uniformly processed and standardized to ensure consistency for systematic evaluation and downstream analyses. To evaluate performance, we collect and reproduce a range of scRNA-seq clustering methods, including traditional, deep learning-based, graph-based, and biological foundation models. We comprehensively evaluate each method both quantitatively and qualitatively, using core performance metrics as well as visualization analyses. Furthermore, we construct representative downstream biological tasks, such as marker gene identification and cell type annotation, to further assess the practical utility. scCluBench then investigates the performance differences and applicability boundaries of various clustering models across diverse analytical tasks, systematically assessing their robustness and scalability in real-world scenarios. Overall, scCluBench offers a standardized and user-friendly benchmark for scRNA-seq clustering, with curated datasets, unified evaluation protocols, and transparent analyses, facilitating informed method selection and providing valuable insights into model generalizability and application scope.
Problem

Research questions and friction points this paper is trying to address.

Benchmarking clustering algorithms for scRNA-seq data
Evaluating method performance across diverse datasets and tasks
Assessing robustness and applicability in biological analyses
Innovation

Methods, ideas, or system contributions that make the work stand out.

Standardized benchmark with 36 uniformly processed datasets
Evaluates diverse clustering methods including deep learning models
Assesses practical utility via downstream biological tasks
🔎 Similar Papers
No similar papers found.
P
Ping Xu
Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
Zaitian Wang
Zaitian Wang
Computer Network Information Center, Chinese Academy of Sciences
Data-centric AILarge Language Models
Zhirui Wang
Zhirui Wang
Aerospace Information Research Institute, Chinese Academy of Sciences
Remote sensing image interpretationtarget detectiontarget recognition
P
Pengjiang Li
Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
J
Jiajia Wang
Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
R
Ran Zhang
Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
P
Pengfei Wang
Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
Yuanchun Zhou
Yuanchun Zhou
Computer Network Information Center,CAS
Data MiningBig Data Analysis