π€ AI Summary
Spatial transcriptomics (ST) suffers from low spatial resolution, hindering fine-grained gene expression profiling; existing super-resolution methods often exhibit reconstruction uncertainty and mode collapse when integrating histological images with gene expression data. To address this, we propose the first cross-modal conditional diffusion model for ST super-resolution, integrating a multimodal disentanglement network with a cross-modal adaptive modulation mechanism. We design dynamic cross-attention to hierarchically model cellβtissue structural relationships and introduce a co-expression gene graph neural network to capture multi-gene synergistic interactions. Evaluated on three public datasets, our method significantly outperforms state-of-the-art approaches, effectively mitigating mode collapse while enhancing both spatial resolution of gene expression maps and biological interpretability.
π Abstract
The recent advancement of spatial transcriptomics (ST) allows to characterize spatial gene expression within tissue for discovery research. However, current ST platforms suffer from low resolution, hindering in-depth understanding of spatial gene expression. Super-resolution approaches promise to enhance ST maps by integrating histology images with gene expressions of profiled tissue spots. However, current super-resolution methods are limited by restoration uncertainty and mode collapse. Although diffusion models have shown promise in capturing complex interactions between multi-modal conditions, it remains a challenge to integrate histology images and gene expression for super-resolved ST maps. This paper proposes a cross-modal conditional diffusion model for super-resolving ST maps with the guidance of histology images. Specifically, we design a multi-modal disentangling network with cross-modal adaptive modulation to utilize complementary information from histology images and spatial gene expression. Moreover, we propose a dynamic cross-attention modelling strategy to extract hierarchical cell-to-tissue information from histology images. Lastly, we propose a co-expression-based gene-correlation graph network to model the co-expression relationship of multiple genes. Experiments show that our method outperforms other state-of-the-art methods in ST super-resolution on three public datasets.