🤖 AI Summary
In single-cell RNA sequencing (scRNA-seq), linear interpolation in variational autoencoder (VAE) latent spaces is commonly—but incorrectly—assumed to correspond to geodesic paths on the underlying data manifold; latent-space curvature induces systematic deviation, undermining the reliability of Euclidean-assumption downstream methods (e.g., trajectory inference). To address this, we propose FlatVI, the first framework to explicitly incorporate differential-geometric curvature regularization into discrete-likelihood VAEs. By penalizing intrinsic curvature of the learned latent manifold, FlatVI enforces near-Euclidean geometry, ensuring that straight-line interpolations closely approximate true geodesics on the decoded manifold. We theoretically verify consistency on synthetic benchmarks and demonstrate substantial improvements on real time-series scRNA-seq data: enhanced trajectory reconstruction accuracy, superior manifold-aware interpolation fidelity, and improved compatibility with Euclidean downstream tasks.
📝 Abstract
Latent space interpolations are a powerful tool for navigating deep generative models in applied settings. An example is single-cell RNA sequencing, where existing methods model cellular state transitions as latent space interpolations with variational autoencoders, often assuming linear shifts and Euclidean geometry. However, unless explicitly enforced, linear interpolations in the latent space may not correspond to geodesic paths on the data manifold, limiting methods that assume Euclidean geometry in the data representations. We introduce FlatVI, a novel training framework that regularises the latent manifold of discrete-likelihood variational autoencoders towards Euclidean geometry, specifically tailored for modelling single-cell count data. By encouraging straight lines in the latent space to approximate geodesic interpolations on the decoded single-cell manifold, FlatVI enhances compatibility with downstream approaches that assume Euclidean latent geometry. Experiments on synthetic data support the theoretical soundness of our approach, while applications to time-resolved single-cell RNA sequencing data demonstrate improved trajectory reconstruction and manifold interpolation.