🤖 AI Summary
This work proposes an intrinsic regression framework in Wasserstein space for settings where both the response and multiple predictors are probability distributions, addressing the modeling challenges posed by the space’s nonlinear geometry. The method aligns predictor distributions via optimal transport and aggregates them through a weighted Fréchet mean to construct a mapping from multiple input distributions to a distributional output. The model accommodates multiple distributional covariates, yields interpretable weights that quantify the contribution of each predictor, and is invariant to arbitrary choices of reference distributions. Theoretical analysis establishes identifiability of the regression operator and consistency of its estimator. Both simulations and real-data experiments demonstrate that the proposed approach outperforms existing Wasserstein regression methods in terms of predictive accuracy and interpretability.
📝 Abstract
We study distribution-on-distribution regression problems in which a response distribution depends on multiple distributional predictors. Such settings arise naturally in applications where the outcome distribution is driven by several heterogeneous distributional sources, yet remain challenging due to the nonlinear geometry of the Wasserstein space. We propose an intrinsic regression framework that aggregates predictor-specific transported distributions through a weighted Fr\'echet mean in the Wasserstein space. The resulting model admits multiple distributional predictors, assigns interpretable weights quantifying their relative contributions, and defines a flexible regression operator that is invariant to auxiliary construction choices, such as the selection of a reference distribution. From a theoretical perspective, we establish identifiability of the induced regression operator and derive asymptotic guarantees for its estimation under a predictive Wasserstein semi-norm, which directly characterizes convergence of the composite prediction map. Extensive simulation studies and a real data application demonstrate the improved predictive performance and interpretability of the proposed approach compared with existing Wasserstein regression methods.