Distance-Preserving Spatial Representations in Genomic Data

📅 2024-08-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Single-cell gene expression data often lack spatial coordinates, hindering spatial biological interpretation. To address this, we propose dp-VAE—a variational autoencoder incorporating distance-preserving regularization—marking the first integration of such geometric constraints into VAEs for spatial reconstruction. We theoretically establish that dp-VAE satisfies the bi-Lipschitz condition, ensuring geometric fidelity in spatial recovery and cross-dataset transferability. The framework enables end-to-end spatial coordinate reconstruction from non-spatial single-cell data without requiring spatial labels and supports zero-shot spatial inference. We systematically evaluate dp-VAE on 27 public datasets, demonstrating substantial improvements in training robustness, cross-platform generalization, and spatial context completion accuracy. Our approach establishes a novel, interpretable, and transferable paradigm for recovering spatial information from non-spatial single-cell transcriptomic data.

Technology Category

Application Category

📝 Abstract
The spatial context of single-cell gene expression data is crucial for many downstream analyses, yet often remains inaccessible due to practical and technical limitations, restricting the utility of such datasets. In this paper, we propose a generic representation learning and transfer learning framework dp-VAE, capable of reconstructing the spatial coordinates associated with the provided gene expression data. Central to our approach is a distance-preserving regularizer integrated into the loss function during training, ensuring the model effectively captures and utilizes spatial context signals from reference datasets. During the inference stage, the produced latent representation of the model can be used to reconstruct or impute the spatial context of the provided gene expression by solving a constrained optimization problem. We also explore the theoretical connections between distance-preserving loss, distortion, and the bi-Lipschitz condition within generative models. Finally, we demonstrate the effectiveness of dp-VAE in different tasks involving training robustness, out-of-sample evaluation, and transfer learning inference applications by testing it over 27 publicly available datasets. This underscores its applicability to a wide range of genomics studies that were previously hindered by the absence of spatial data.
Problem

Research questions and friction points this paper is trying to address.

Single-cell Gene Expression
Spatial Information
Data Enhancement
Innovation

Methods, ideas, or system contributions that make the work stand out.

dp-VAE
Spatial Information Recovery
Distance-Preserving Constraint
🔎 Similar Papers
No similar papers found.
W
Wenbin Zhou
Heinz College of Information Systems and Public Policy and Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, 15213
Jin-Hong Du
Jin-Hong Du
Carnegie Mellon University
high-dimensional statisticsoverparameterized learningsingle-cell data analysis