🤖 AI Summary
Estimating causal dose–response functions over the full exposure range remains challenging with a single data source due to limited support and insufficient variation.
Method: We propose a multi-source partially aligned data fusion framework. Its core is a Neyman-orthogonal loss function tailored for data fusion, coupled with a stochastic approximation algorithm that preserves orthogonality. We instantiate the estimator via kernel ridge regression combined with the orthogonal loss, yielding a closed-form solution that balances statistical accuracy and computational efficiency.
Contribution/Results: We theoretically establish that multi-source fusion tightens the finite-sample regret bound and improves worst-case performance. Simulation studies demonstrate substantial gains over single-source estimators—particularly in estimating non-smooth causal parameters—validating the practical efficacy of data fusion in causal inference.
📝 Abstract
Estimating the causal dose-response function is challenging, particularly when data from a single source are insufficient to estimate responses precisely across all exposure levels. To overcome this limitation, we propose a data fusion framework that leverages multiple data sources that are partially aligned with the target distribution. Specifically, we derive a Neyman-orthogonal loss function tailored for estimating the dose-response function within data fusion settings. To improve computational efficiency, we propose a stochastic approximation that retains orthogonality. We apply kernel ridge regression with this approximation, which provides closed-form estimators. Our theoretical analysis demonstrates that incorporating additional data sources yields tighter finite-sample regret bounds and improved worst-case performance, as confirmed via minimax lower bound comparison. Simulation studies validate the practical advantages of our approach, showing improved estimation accuracy when employing data fusion. This study highlights the potential of data fusion for estimating non-smooth parameters such as causal dose-response functions.