Towards Applying Large Language Models to Complement Single-Cell Foundation Models

📅 2025-07-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing single-cell foundation models (e.g., scGPT) struggle to leverage vast biological text knowledge, and prevailing approaches treat large language models (LLMs) as substitutes—not complements—to specialized single-cell models. Method: We propose scMPT, the first framework to systematically identify key biological factors—such as gene functions and pathway descriptions—that enable LLMs to improve single-cell analysis. scMPT introduces a multimodal feature fusion mechanism that jointly models LLM-derived textual knowledge and scGPT’s cell-gene representations. Contribution/Results: This complementary fusion paradigm mitigates performance volatility inherent in individual models, substantially enhancing cross-dataset generalizability and robustness. Extensive experiments demonstrate that scMPT consistently outperforms standalone scGPT and LLM baselines across downstream tasks—including clustering, cell-type annotation, and batch correction—validating the efficacy of knowledge augmentation and synergistic modeling.

Technology Category

Application Category

📝 Abstract
Single-cell foundation models such as scGPT represent a significant advancement in single-cell omics, with an ability to achieve state-of-the-art performance on various downstream biological tasks. However, these models are inherently limited in that a vast amount of information in biology exists as text, which they are unable to leverage. There have therefore been several recent works that propose the use of LLMs as an alternative to single-cell foundation models, achieving competitive results. However, there is little understanding of what factors drive this performance, along with a strong focus on using LLMs as an alternative, rather than complementary approach to single-cell foundation models. In this study, we therefore investigate what biological insights contribute toward the performance of LLMs when applied to single-cell data, and introduce scMPT; a model which leverages synergies between scGPT, and single-cell representations from LLMs that capture these insights. scMPT demonstrates stronger, more consistent performance than either of its component models, which frequently have large performance gaps between each other across datasets. We also experiment with alternate fusion methods, demonstrating the potential of combining specialized reasoning models with scGPT to improve performance. This study ultimately showcases the potential for LLMs to complement single-cell foundation models and drive improvements in single-cell analysis.
Problem

Research questions and friction points this paper is trying to address.

Bridging text-based biology knowledge with single-cell omics models
Understanding factors driving LLM performance in single-cell analysis
Integrating LLMs and scGPT for enhanced single-cell model synergy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining scGPT with LLMs for single-cell analysis
Introducing scMPT to leverage model synergies
Fusing specialized reasoning models with scGPT
🔎 Similar Papers
No similar papers found.
S
Steven Palayew
University of Toronto, Vector Institute
B
Bo Wang
University of Toronto, Vector Institute, Peter Munk Cardiac Centre, AI Hub
Gary Bader
Gary Bader
Professor of Molecular Genetics and Computer Science, The Donnelly Centre, University of Toronto
Computational BiologySystems Biology