π€ AI Summary
This study addresses the challenge of causal effect estimation in settings characterized by high-dimensional covariates, multi-valued treatments, and predominantly observational data with only limited randomized controlled trial (RCT) samples. In such scenarios, treatment-induced structural non-overlap violates the overlap assumption required by conventional weighting-based fusion methods, undermining causal identification. The paper formally characterizes this problem, revealing fundamental limitations of existing approaches, and proposes a constrained joint estimation framework that enforces causal validity through orthogonal experimental moment conditions while simultaneously learning representations and predictors to recover effective overlap. A penalized primal-dual algorithm is developed, accompanied by an oracle inequality that accounts for overlap recovery error, moment violation, and statistical error. Experiments demonstrate the methodβs robustness across varying degrees of non-overlap and its superior performance in a large-scale ride-hailing application, matching the accuracy of baselines that would require several times more RCT data.
π Abstract
Causal inference in modern largescale systems faces growing challenges, including highdimensional covariates, multi-valued treatments, massive observational (OBS) data, and limited randomized controlled trial (RCT) samples due to cost constraints. We formalize treatment-induced structural non-overlap and show that, under this regime, commonly used weighted fusion methods provably fail to satisfy randomized identifying restrictions.To address this issue,we propose a constrained joint estimation framework that minimizes observational risk while enforcing causal validity through orthogonal experimental moment conditions. We further show that structural non-overlap creates a feasibility obstruction for moment enforcement in the original covariate space.We also derive a penalized primaldual algorithm that jointly learns representations and predictors, and establish oracle inequalities decomposing error into overlap recovery, moment violation, and statistical terms.Extensive synthetic experiments demonstrate robust performance under varying degrees of nonoverlap. A largescale ridehailing application shows that our method achieves substantial gains over existing baselines, matching the performance of models trained with significantly more RCT data.