🤖 AI Summary
This study addresses the challenge of transferring high-information, yet scarce, spatial transcriptomics–derived tissue microenvironment structures to widely available but lower-resolution hematoxylin and eosin (H&E) histology images. The authors propose a cross-modal knowledge distillation framework that leverages paired spatial transcriptomics and H&E images to distill transcriptome-defined microenvironmental knowledge into a model operating solely on H&E inputs. This approach represents the first successful cross-modal transfer of microenvironmental architecture from spatial transcriptomics to histological imaging. It significantly outperforms morphology-based unsupervised baselines across diverse tissue types and disease contexts, accurately reconstructing biologically interpretable cellular neighborhood structures without requiring transcriptomic data and demonstrating strong generalization to unseen samples.
📝 Abstract
Spatial transcriptomics provides a molecularly rich description of tissue organization, enabling unsupervised discovery of tissue niches -- spatially coherent regions of distinct cell-type composition and function that are relevant to both biological research and clinical interpretation. However, spatial transcriptomics remains costly and scarce, while H&E histology is abundant but carries a less granular signal. We propose to leverage paired spatial transcriptomics and H&E data to transfer transcriptomics-derived niche structure to a histology-only model via cross-modal distillation. Across multiple tissue types and disease contexts, the distilled model achieves substantially higher agreement with transcriptomics-derived niche structure than unsupervised morphology-based baselines trained on identical image features, and recovers biologically meaningful neighborhood composition as confirmed by cell-type analysis. The resulting framework leverages paired spatial transcriptomic and H&E data during training, and can then be applied to held-out tissue regions using histology alone, without any transcriptomic input at inference time.