🤖 AI Summary
Modeling the geometric structure of high-dimensional data from single-cell RNA sequencing and spatial omics remains challenging, as existing methods fail to accurately characterize manifold shape, generate geodesic paths, or model population-level cellular migration. Method: We propose the first framework that tightly integrates learned warped Riemannian metrics with generative autoencoders: it jointly leverages data points and negative samples in the embedding space to learn a data-dependent Riemannian metric, enabling full-space geometric awareness; supports uniform manifold sampling, geodesic interpolation, and cross-population geodesic flow modeling; and unifies manifold learning, variational inference, and negative-sampling-based metric learning. Contribution/Results: On single-cell trajectory inference, our method achieves a 30% improvement over state-of-the-art baselines. Extensive validation on synthetic and real spatial transcriptomics datasets confirms its geometric consistency and generation fidelity.
📝 Abstract
Rapid growth of high-dimensional datasets in fields such as single-cell RNA sequencing and spatial genomics has led to unprecedented opportunities for scientific discovery, but it also presents unique computational and statistical challenges. Traditional methods struggle with geometry-aware data generation, interpolation along meaningful trajectories, and transporting populations via feasible paths. To address these issues, we introduce Geometry-Aware Generative Autoencoder (GAGA), a novel framework that combines extensible manifold learning with generative modeling. GAGA constructs a neural network embedding space that respects the intrinsic geometries discovered by manifold learning and learns a novel warped Riemannian metric on the data space. This warped metric is derived from both the points on the data manifold and negative samples off the manifold, allowing it to characterize a meaningful geometry across the entire latent space. Using this metric, GAGA can uniformly sample points on the manifold, generate points along geodesics, and interpolate between populations across the learned manifold using geodesic-guided flows. GAGA shows competitive performance in simulated and real-world datasets, including a 30% improvement over the state-of-the-art methods in single-cell population-level trajectory inference.