π€ AI Summary
To address modality imbalance, information loss, and over-smoothing in multimodal integration of spatial transcriptomics and histopathological images, this paper proposes SENCA-stβa novel model featuring a shared encoder and a neighborhood cross-attention mechanism. This design preserves modality-specific characteristics while enhancing discriminability among structurally similar but functionally distinct tissue regions. By mitigating unimodal dependency, SENCA-st enables precise identification of tumor heterogeneity and functional compartments within the tumor microenvironment. Evaluated on multiple cancer datasets, SENCA-st consistently outperforms state-of-the-art methods (e.g., SPACEL, STAlign), achieving improvements of 8.2%β14.6% in segmentation and clustering metrics. Furthermore, the model offers inherent interpretability and strong potential for clinical translation.
π Abstract
Spatial transcriptomics is an emerging field that enables the identification of functional regions based on the spatial distribution of gene expression. Integrating this functional information present in transcriptomic data with structural data from histopathology images is an active research area with applications in identifying tumor substructures associated with cancer drug resistance. Current histopathology-spatial-transcriptomic region segmentation methods suffer due to either making spatial transcriptomics prominent by using histopathology features just to assist processing spatial transcriptomics data or using vanilla contrastive learning that make histopathology images prominent due to only promoting common features losing functional information. In both extremes, the model gets either lost in the noise of spatial transcriptomics or overly smoothed, losing essential information. Thus, we propose our novel architecture SENCA-st (Shared Encoder with Neighborhood Cross Attention) that preserves the features of both modalities. More importantly, it emphasizes regions that are structurally similar in histopathology but functionally different on spatial transcriptomics using cross-attention. We demonstrate the superior performance of our model that surpasses state-of-the-art methods in detecting tumor heterogeneity and tumor micro-environment regions, a clinically crucial aspect.