🤖 AI Summary
This work addresses the challenge of reconstructing cellular differentiation tree structures in single-cell trajectory inference. We propose a geometric modeling framework based on the varifold distance, which jointly integrates RNA velocity and gene expression data. Discrete vector fields derived from RNA velocity are integrated to generate cellular trajectories, which are then embedded into Euclidean space. Crucially, we introduce the varifold distance—employed here for the first time in trajectory inference—to quantify similarity between such paths, enabling approximation of shortest-path distances on the underlying developmental tree. We provide theoretical guarantees showing that the varifold distance converges uniformly to the true tree metric. Empirically, our method accurately recovers differentiation topology on both synthetic and real single-cell datasets, significantly improving geometric fidelity and biological interpretability of inferred tree-structured trajectories.
📝 Abstract
In this paper, we consider a tree inference problem motivated by the critical problem in single-cell genomics of reconstructing dynamic cellular processes from sequencing data. In particular, given a population of cells sampled from such a process, we are interested in the problem of ordering the cells according to their progression in the process. This is known as trajectory inference. If the process is differentiation, this amounts to reconstructing the corresponding differentiation tree. One way of doing this in practice is to estimate the shortest-path distance between nodes based on cell similarities observed in sequencing data. Recent sequencing techniques make it possible to measure two types of data: gene expression levels, and RNA velocity, a vector that predicts changes in gene expression. The data then consist of a discrete vector field on a (subset of a) Euclidean space of dimension equal to the number of genes under consideration. By integrating this velocity field, we trace the evolution of gene expression levels in each single cell from some initial stage to its current stage. Eventually, we assume that we have a faithful embedding of the differentiation tree in a Euclidean space, but which we only observe through the curves representing the paths from the root to the nodes. Using varifold distances between such curves, we define a similarity measure between nodes which we prove approximates the shortest-path distance in a tree that is isomorphic to the target tree.