🤖 AI Summary
This work addresses the challenge of comparing distributional dynamics in causal inference and domain adaptation, where distributions evolve under external forces or initial conditions yet reside in Wasserstein space—a setting lacking a natural vector structure. To overcome this limitation, the authors propose the “Wasserstein parallel trends” principle, which leverages the dynamics in the tangent space along optimal transport geodesics and introduces a novel fanning scheme to provide the first theoretical guarantee for parallel transport in Wasserstein space. This extends the classical parallel trends assumption from scalar outcomes to entire probability distributions. By integrating Wasserstein geometry with closed-form solutions for Gaussian measures, the method enables counterfactual imputation of gene expression dynamics across biological systems, demonstrating its efficacy on synthetic data and two single-cell RNA-seq datasets.
📝 Abstract
Many scientific systems, such as cellular populations or economic cohorts, are naturally described by probability distributions that evolve over time. Predicting how such a system would have evolved under different forces or initial conditions is fundamental to causal inference, domain adaptation, and counterfactual prediction. However, the space of distributions often lacks the vector space structure on which classical methods rely. To address this, we introduce a general notion of parallel dynamics at a distributional level. We base this principle on parallel transport of tangent dynamics along optimal transport geodesics and call it ``Wasserstein Parallel Trends''. By replacing the vector subtraction of classic methods with geodesic parallel transport, we can provide counterfactual comparisons of distributional dynamics in applications such as causal inference, domain adaptation, and batch-effect correction in experimental settings. The main mathematical contribution is a novel notion of fanning scheme on the Wasserstein manifold that allows us to efficiently approximate parallel transport along geodesics while also providing the first theoretical guarantees for parallel transport in the Wasserstein space. We also show that Wasserstein Parallel Trends recovers the classic parallel trends assumption for averages as a special case and derive closed-form parallel transport for Gaussian measures. We deploy the method on synthetic data and two single-cell RNA sequencing datasets to impute gene-expression dynamics across biological systems.