SPATIA: Multimodal Model for Prediction and Generation of Spatial Cell Phenotypes

📅 2025-07-07

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This study addresses the challenge of unified modeling of single-cell multimodal data—encompassing morphology, gene expression, and spatial coordinates—by proposing the first cross-scale, spatially aware generative framework. Methodologically, it introduces a multi-scale Transformer architecture that ingests morphological and transcriptomic tokens, fuses them via cross-attention, and incorporates a spatially guided token merging mechanism alongside a diffusion-based decoder to enable high-resolution cellular image reconstruction. The framework supports joint representation learning across cellular, microenvironmental, and tissue-level scales. Evaluated on 12 downstream tasks, it consistently outperforms 13 baseline models, achieving significant gains in cell annotation, spatial clustering, gene imputation, and cross-modal prediction. Notably, it is the first method to generate biologically plausible cellular morphologies conditioned on transcriptomic states.

Technology Category

Application Category

📝 Abstract

Understanding how cellular morphology, gene expression, and spatial organization jointly shape tissue function is a central challenge in biology. Image-based spatial transcriptomics technologies now provide high-resolution measurements of cell images and gene expression profiles, but machine learning methods typically analyze these modalities in isolation or at limited resolution. We address the problem of learning unified, spatially aware representations that integrate cell morphology, gene expression, and spatial context across biological scales. This requires models that can operate at single-cell resolution, reason across spatial neighborhoods, and generalize to whole-slide tissue organization. Here, we introduce SPATIA, a multi-scale generative and predictive model for spatial transcriptomics. SPATIA learns cell-level embeddings by fusing image-derived morphological tokens and transcriptomic vector tokens using cross-attention and then aggregates them at niche and tissue levels using transformer modules to capture spatial dependencies. SPATIA incorporates token merging in its generative diffusion decoder to synthesize high-resolution cell images conditioned on gene expression. We assembled a multi-scale dataset consisting of 17 million cell-gene pairs, 1 million niche-gene pairs, and 10,000 tissue-gene pairs across 49 donors, 17 tissue types, and 12 disease states. We benchmark SPATIA against 13 existing models across 12 individual tasks, which span several categories including cell annotation, cell clustering, gene imputation, cross-modal prediction, and image generation. SPATIA achieves improved performance over all baselines and generates realistic cell morphologies that reflect transcriptomic perturbations.

Problem

Research questions and friction points this paper is trying to address.

Integrates cell morphology, gene expression, and spatial context

Learns unified representations across biological scales

Generates high-resolution cell images from gene expression

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fuses image and gene data via cross-attention

Uses transformers for multi-scale spatial modeling

Generates cell images with diffusion decoder

🔎 Similar Papers

No similar papers found.