π€ AI Summary
This work addresses the fundamental trade-off between computational efficiency and geometric fidelity in manifold learning. We propose a scalable, fractional-order-driven pullback Riemannian geometry framework: the pullback metric is defined via fractional score matching for density gradient estimation, enabling closed-form geodesic computation and unbiased intrinsic dimension estimation. We introduce the first Riemannian autoencoder with theoretically guaranteed error bounds and an isometric regularization mechanism tailored for anisotropic normalizing flows. Evaluated on image and other multimodal datasets, our method substantially improves geodesic generation quality, intrinsic dimension estimation accuracy, and interpretability of global coordinate charts. It establishes the first unified framework for complete geometric modeling of data manifolds that is simultaneously scalable, verifiable (via theoretical guarantees), and analytically tractable.
π Abstract
Data-driven Riemannian geometry has emerged as a powerful tool for interpretable representation learning, offering improved efficiency in downstream tasks. Moving forward, it is crucial to balance cheap manifold mappings with efficient training algorithms. In this work, we integrate concepts from pullback Riemannian geometry and generative models to propose a framework for data-driven Riemannian geometry that is scalable in both geometry and learning: score-based pullback Riemannian geometry. Focusing on unimodal distributions as a first step, we propose a score-based Riemannian structure with closed-form geodesics that pass through the data probability density. With this structure, we construct a Riemannian autoencoder (RAE) with error bounds for discovering the correct data manifold dimension. This framework can naturally be used with anisotropic normalizing flows by adopting isometry regularization during training. Through numerical experiments on diverse datasets, including image data, we demonstrate that the proposed framework produces high-quality geodesics passing through the data support, reliably estimates the intrinsic dimension of the data manifold, and provides a global chart of the manifold. To the best of our knowledge, this is the first scalable framework for extracting the complete geometry of the data manifold.