Multimodal Language Modeling for High-Accuracy Single Cell Transcriptomics Analysis and Generation

📅 2025-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing single-cell pre-trained language models (PLMs) suffer from modality disconnection with text PLMs, hindering cross-modal tasks; mainstream fusion approaches further incur information loss and inadequate unimodal representation learning. To address this, we propose scMMGPT—a unified multimodal generative pre-trained Transformer for single-cell data—introducing the first cell-text dual-modality alignment architecture. It integrates a cross-modal projector, modality adapters, and a k-NN–conditioned generation mechanism. Trained jointly on 27 million single cells and their associated literature, scMMGPT employs contrastive learning and masked reconstruction objectives. Experiments demonstrate substantial improvements: 84% gain in BLEU score for cell description generation, 20.5% increase in cell-type annotation accuracy, and 4% improvement in k-NN accuracy for text-guided pseudo-cell generation. scMMGPT effectively bridges the cross-modal semantic gap and enables robust bidirectional knowledge transfer.

Technology Category

Application Category

📝 Abstract
Pre-trained language models (PLMs) have revolutionized scientific research, yet their application to single-cell analysis remains limited. Text PLMs cannot process single-cell RNA sequencing data, while cell PLMs lack the ability to handle free text, restricting their use in multimodal tasks. Existing efforts to bridge these modalities often suffer from information loss or inadequate single-modal pre-training, leading to suboptimal performances. To address these challenges, we propose Single-Cell MultiModal Generative Pre-trained Transformer (scMMGPT), a unified PLM for joint cell and text modeling. scMMGPT effectively integrates the state-of-the-art cell and text PLMs, facilitating cross-modal knowledge sharing for improved performance. To bridge the text-cell modality gap, scMMGPT leverages dedicated cross-modal projectors, and undergoes extensive pre-training on 27 million cells -- the largest dataset for multimodal cell-text PLMs to date. This large-scale pre-training enables scMMGPT to excel in joint cell-text tasks, achieving an 84% relative improvement of textual discrepancy for cell description generation, 20.5% higher accuracy for cell type annotation, and 4% improvement in $k$-NN accuracy for text-conditioned pseudo-cell generation, outperforming baselines.
Problem

Research questions and friction points this paper is trying to address.

Bridging text and single-cell RNA sequencing data modalities
Overcoming limitations of existing pre-trained language models
Enhancing accuracy in cell description and type annotation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified PLM for joint cell and text modeling
Leverages cross-modal projectors for modality integration
Extensive pre-training on 27 million cells
🔎 Similar Papers
No similar papers found.
Yaorui Shi
Yaorui Shi
University of Science and Technology of China
Large Language Model
J
Jiaqi Yang
University of Science and Technology of China
S
Sihang Li
University of Science and Technology of China
Junfeng Fang
Junfeng Fang
National University of Singapore
Model EditingAI SafetyLLM ExplainabilityAI4Science
X
Xiang Wang
University of Science and Technology of China
Z
Zhiyuan Liu
National University of Singapore
Y
Yang Zhang
National University of Singapore