🤖 AI Summary
Classical matching methods neglect the geometric structure of the underlying data manifold, leading to biased causal effect estimation—particularly in high-dimensional settings with noisy covariates. To address this, we propose GeoMatching, the first framework that integrates implicit Riemannian manifold learning with matching-based causal inference. GeoMatching models the causal geometry of confounders via uncertainty-aware geometric embedding and performs matching on the learned latent manifold using geodesic distance, enabling robustness in semi-supervised and noise-corrupted regimes. Extensive experiments on synthetic and real-world datasets demonstrate significant improvements in treatment effect estimation accuracy. Notably, GeoMatching exhibits strong robustness under high dimensionality, outlier contamination, and limited labeled data. Our work establishes a novel paradigm for manifold-aware causal inference, bridging geometric deep learning and causal matching.
📝 Abstract
Matching is a popular approach in causal inference to estimate treatment effects by pairing treated and control units that are most similar in terms of their covariate information. However, classic matching methods completely ignore the geometry of the data manifold, which is crucial to define a meaningful distance for matching, and struggle when covariates are noisy and high-dimensional. In this work, we propose GeoMatching, a matching method to estimate treatment effects that takes into account the intrinsic data geometry induced by existing causal mechanisms among the confounding variables. First, we learn a low-dimensional, latent Riemannian manifold that accounts for uncertainty and geometry of the original input data. Second, we estimate treatment effects via matching in the latent space based on the learned latent Riemannian metric. We provide theoretical insights and empirical results in synthetic and real-world scenarios, demonstrating that GeoMatching yields more effective treatment effect estimators, even as we increase input dimensionality, in the presence of outliers, or in semi-supervised scenarios.