Latent Gene Diffusion for Spatial Transcriptomics Completion

📅 2025-09-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

185K/year
🤖 AI Summary
Severe gene expression dropout in spatial transcriptomics (ST) critically impedes histopathology image–guided expression prediction; existing approaches rely on single-cell RNA-seq (scRNA-seq) reference data, rendering them vulnerable to alignment errors, batch effects, and external data bias. To address this, we propose LGDiST—the first reference-free latent-variable diffusion model for ST. LGDiST innovatively encodes contextual genes into a biologically interpretable gene latent space and introduces a neighborhood-conditioned generation mechanism, enabling end-to-end, image-guided expression imputation without scRNA-seq references. The model jointly optimizes ST latent representation learning, contextual gene modeling, and cross-modal image-to-gene generation. Evaluated across 26 ST datasets, LGDiST achieves an average 18% reduction in MSE over baselines and outperforms six state-of-the-art methods by up to 10%. Ablation studies confirm the substantial contribution of each component.

Technology Category

Application Category

📝 Abstract
Computer Vision has proven to be a powerful tool for analyzing Spatial Transcriptomics (ST) data. However, current models that predict spatially resolved gene expression from histopathology images suffer from significant limitations due to data dropout. Most existing approaches rely on single-cell RNA sequencing references, making them dependent on alignment quality and external datasets while also risking batch effects and inherited dropout. In this paper, we address these limitations by introducing LGDiST, the first reference-free latent gene diffusion model for ST data dropout. We show that LGDiST outperforms the previous state-of-the-art in gene expression completion, with an average Mean Squared Error that is 18% lower across 26 datasets. Furthermore, we demonstrate that completing ST data with LGDiST improves gene expression prediction performance on six state-of-the-art methods up to 10% in MSE. A key innovation of LGDiST is using context genes previously considered uninformative to build a rich and biologically meaningful genetic latent space. Our experiments show that removing key components of LGDiST, such as the context genes, the ST latent space, and the neighbor conditioning, leads to considerable drops in performance. These findings underscore that the full architecture of LGDiST achieves substantially better performance than any of its isolated components.
Problem

Research questions and friction points this paper is trying to address.

Addressing data dropout in spatial transcriptomics without external references
Improving gene expression prediction accuracy using latent diffusion models
Leveraging context genes to create biologically meaningful genetic latent space
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reference-free latent gene diffusion model
Uses context genes for genetic latent space
Neighbor conditioning enhances gene expression completion
🔎 Similar Papers
2024-04-19International Conference on Medical Image Computing and Computer-Assisted InterventionCitations: 7
P
Paula Cárdenas
Center for Research and Formation in Artificial Intelligence, Universidad de los Andes, Bogotá, Colombia
L
Leonardo Manrique
Center for Research and Formation in Artificial Intelligence, Universidad de los Andes, Bogotá, Colombia
D
Daniela Vega
Center for Research and Formation in Artificial Intelligence, Universidad de los Andes, Bogotá, Colombia
Daniela Ruiz
Daniela Ruiz
Universidad de los Andes
Computer visionBiomedical Image AnalysisTranscriptomicsMachine learningDeep learning
P
Pablo Arbeláez
Center for Research and Formation in Artificial Intelligence, Universidad de los Andes, Bogotá, Colombia