MultiST: A Cross-Attention-Based Multimodal Model for Spatial Transcriptomic

📅 2026-01-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing spatial transcriptomics methods struggle to effectively integrate tissue morphology with gene expression, often resulting in ambiguous spatial domain boundaries. To address this limitation, this work proposes MultiST, the first framework that jointly models molecular expression, spatial topology, and tissue morphology within a unified architecture. By leveraging graph neural networks, cross-attention mechanisms, and adversarial alignment strategies, MultiST achieves deep multimodal integration. The method incorporates color-normalized histological image features and co-optimizes molecular–morphological dependencies to refine domain boundaries. Comprehensive experiments across 13 datasets demonstrate that MultiST significantly enhances the sharpness of spatial domain delineation, improves pseudotemporal trajectory stability, and increases the biological interpretability of inferred cell–cell interaction patterns.

Technology Category

Application Category

📝 Abstract
Spatial transcriptomics (ST) enables transcriptome-wide profiling while preserving the spatial context of tissues, offering unprecedented opportunities to study tissue organization and cell-cell interactions in situ. Despite recent advances, existing methods often lack effective integration of histological morphology with molecular profiles, relying on shallow fusion strategies or omitting tissue images altogether, which limits their ability to resolve ambiguous spatial domain boundaries. To address this challenge, we propose MultiST, a unified multimodal framework that jointly models spatial topology, gene expression, and tissue morphology through cross-attention-based fusion. MultiST employs graph-based gene encoders with adversarial alignment to learn robust spatial representations, while integrating color-normalized histological features to capture molecular-morphological dependencies and refine domain boundaries. We evaluated the proposed method on 13 diverse ST datasets spanning two organs, including human brain cortex and breast cancer tissue. MultiST yields spatial domains with clearer and more coherent boundaries than existing methods, leading to more stable pseudotime trajectories and more biologically interpretable cell-cell interaction patterns. The MultiST framework and source code are available at https://github.com/LabJunBMI/MultiST.git.
Problem

Research questions and friction points this paper is trying to address.

spatial transcriptomics
multimodal integration
histological morphology
spatial domain boundaries
molecular-morphological alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-attention
multimodal fusion
spatial transcriptomics
graph-based encoding
histological morphology
🔎 Similar Papers
2024-04-19International Conference on Medical Image Computing and Computer-Assisted InterventionCitations: 7
W
Wei Wang
Department of Computer Science, University of Cincinnati, Cincinnati, OH 45221, USA
Q
Quoc-Toan Ly
Department of Computer Science, University of Cincinnati, Cincinnati, OH 45221, USA
Chong Yu
Chong Yu
Assistant Professor of Computer Science, University of Cincinnati
AIFederated LearningCybersecurity
Jun Bai
Jun Bai
Assistant professor
Computer aided drug discoveryMedical image analysisAI therapeutic target identification