🤖 AI Summary
Severe gene expression dropout in spatial transcriptomics (ST) critically impedes histopathology image–guided expression prediction; existing approaches rely on single-cell RNA-seq (scRNA-seq) reference data, rendering them vulnerable to alignment errors, batch effects, and external data bias. To address this, we propose LGDiST—the first reference-free latent-variable diffusion model for ST. LGDiST innovatively encodes contextual genes into a biologically interpretable gene latent space and introduces a neighborhood-conditioned generation mechanism, enabling end-to-end, image-guided expression imputation without scRNA-seq references. The model jointly optimizes ST latent representation learning, contextual gene modeling, and cross-modal image-to-gene generation. Evaluated across 26 ST datasets, LGDiST achieves an average 18% reduction in MSE over baselines and outperforms six state-of-the-art methods by up to 10%. Ablation studies confirm the substantial contribution of each component.
📝 Abstract
Computer Vision has proven to be a powerful tool for analyzing Spatial Transcriptomics (ST) data. However, current models that predict spatially resolved gene expression from histopathology images suffer from significant limitations due to data dropout. Most existing approaches rely on single-cell RNA sequencing references, making them dependent on alignment quality and external datasets while also risking batch effects and inherited dropout. In this paper, we address these limitations by introducing LGDiST, the first reference-free latent gene diffusion model for ST data dropout. We show that LGDiST outperforms the previous state-of-the-art in gene expression completion, with an average Mean Squared Error that is 18% lower across 26 datasets. Furthermore, we demonstrate that completing ST data with LGDiST improves gene expression prediction performance on six state-of-the-art methods up to 10% in MSE. A key innovation of LGDiST is using context genes previously considered uninformative to build a rich and biologically meaningful genetic latent space. Our experiments show that removing key components of LGDiST, such as the context genes, the ST latent space, and the neighbor conditioning, leads to considerable drops in performance. These findings underscore that the full architecture of LGDiST achieves substantially better performance than any of its isolated components.