scE$^2$TM: Toward Interpretable Single-Cell Embedding via Topic Modeling

📅 2025-07-11

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

Existing single-cell embedding topic models suffer from two interpretability bottlenecks: reliance on subjective qualitative evaluation—leading to “interpretation collapse”—and failure to integrate external biological knowledge, hindering mechanistic discovery. To address these, we propose Knowledge-Guided Single-Cell Embedding Topic Modeling (KG-ScETM), which explicitly incorporates prior pathway and gene-set knowledge into the topic modeling process and introduces the first quantitative interpretability benchmark comprising ten metrics. KG-ScETM synergistically combines topic modeling with deep representation learning to yield biologically grounded cell embeddings and clustering. Evaluated on 20 real-world single-cell datasets, KG-ScETM consistently outperforms seven state-of-the-art methods across clustering accuracy, topic diversity, and biological coherence—including significantly improved GO enrichment significance. Our work establishes a new paradigm for interpretable single-cell analysis.

Technology Category

Application Category

📝 Abstract

Recent advances in sequencing technologies have enabled researchers to explore cellular heterogeneity at single-cell resolution. Meanwhile, interpretability has gained prominence parallel to the rapid increase in the complexity and performance of deep learning models. In recent years, topic models have been widely used for interpretable single-cell embedding learning and clustering analysis, which we refer to as single-cell embedded topic models. However, previous studies evaluated the interpretability of the models mainly through qualitative analysis, and these single-cell embedded topic models suffer from the potential problem of interpretation collapse. Furthermore, their neglect of external biological knowledge constrains analytical performance. Here, we present scE2TM, an external knowledge-guided single-cell embedded topic model that provides a high-quality cell embedding and strong interpretation, contributing to comprehensive scRNA-seq data analysis. Our comprehensive evaluation across 20 scRNA-seq datasets demonstrates that scE2TM achieves significant clustering performance gains compared to 7 state-of-the-art methods. In addition, we propose a new interpretability evaluation benchmark that introduces 10 metrics to quantitatively assess the interpretability of single-cell embedded topic models. The results show that the interpretation provided by scE2TM performs encouragingly in terms of diversity and consistency with the underlying biological signals, contributing to a better revealing of the underlying biological mechanisms.

Problem

Research questions and friction points this paper is trying to address.

Addresses interpretability collapse in single-cell topic models

Integrates external biological knowledge to enhance analytical performance

Proposes quantitative metrics for evaluating model interpretability

Innovation

Methods, ideas, or system contributions that make the work stand out.

External knowledge-guided topic modeling

Quantitative interpretability evaluation benchmark

High-quality cell embedding via scE2TM

🔎 Similar Papers

No similar papers found.