Multimodal Language Modeling for High-Accuracy Single Cell Transcriptomics Analysis and Generation

📅 2025-03-12

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing single-cell pre-trained language models (PLMs) suffer from modality disconnection with text PLMs, hindering cross-modal tasks; mainstream fusion approaches further incur information loss and inadequate unimodal representation learning. To address this, we propose scMMGPT—a unified multimodal generative pre-trained Transformer for single-cell data—introducing the first cell-text dual-modality alignment architecture. It integrates a cross-modal projector, modality adapters, and a k-NN–conditioned generation mechanism. Trained jointly on 27 million single cells and their associated literature, scMMGPT employs contrastive learning and masked reconstruction objectives. Experiments demonstrate substantial improvements: 84% gain in BLEU score for cell description generation, 20.5% increase in cell-type annotation accuracy, and 4% improvement in k-NN accuracy for text-guided pseudo-cell generation. scMMGPT effectively bridges the cross-modal semantic gap and enables robust bidirectional knowledge transfer.

Technology Category

Application Category

📝 Abstract

Pre-trained language models (PLMs) have revolutionized scientific research, yet their application to single-cell analysis remains limited. Text PLMs cannot process single-cell RNA sequencing data, while cell PLMs lack the ability to handle free text, restricting their use in multimodal tasks. Existing efforts to bridge these modalities often suffer from information loss or inadequate single-modal pre-training, leading to suboptimal performances. To address these challenges, we propose Single-Cell MultiModal Generative Pre-trained Transformer (scMMGPT), a unified PLM for joint cell and text modeling. scMMGPT effectively integrates the state-of-the-art cell and text PLMs, facilitating cross-modal knowledge sharing for improved performance. To bridge the text-cell modality gap, scMMGPT leverages dedicated cross-modal projectors, and undergoes extensive pre-training on 27 million cells -- the largest dataset for multimodal cell-text PLMs to date. This large-scale pre-training enables scMMGPT to excel in joint cell-text tasks, achieving an 84% relative improvement of textual discrepancy for cell description generation, 20.5% higher accuracy for cell type annotation, and 4% improvement in $k$-NN accuracy for text-conditioned pseudo-cell generation, outperforming baselines.

Problem

Research questions and friction points this paper is trying to address.

Bridging text and single-cell RNA sequencing data modalities

Overcoming limitations of existing pre-trained language models

Enhancing accuracy in cell description and type annotation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified PLM for joint cell and text modeling

Leverages cross-modal projectors for modality integration

Extensive pre-training on 27 million cells

🔎 Similar Papers

No similar papers found.

Authors to Follow