🤖 AI Summary
Estimating causal effects via semiparametric models in multi-source heterogeneous data (e.g., cross-hospital EHRs or clinical trials) poses challenges in privacy preservation and statistical efficiency due to data silos and structural heterogeneity.
Method: We propose a privacy-preserving late-fusion multi-task learning framework that integrates double machine learning with an adaptive task aggregation mechanism. It enables two-stage estimation without sharing individual-level data and employs joint estimation of nuisance parameters across tasks, leveraging their similarity.
Contribution/Results: We establish theoretical guarantees showing that when nuisance parameters are sufficiently similar across tasks, the proposed estimator achieves a faster convergence rate than single-task baselines. Empirical evaluation demonstrates substantial improvements in accuracy and robustness for heterogeneous treatment effect estimation—particularly in moderate-sample regimes—while maintaining computational efficiency and strict privacy compliance.
📝 Abstract
In the age of large and heterogeneous datasets, the integration of information from diverse sources is essential to improve parameter estimation. Multi-task learning offers a powerful approach by enabling simultaneous learning across related tasks. In this work, we introduce a late fusion framework for multi-task learning with semiparametric models that involve infinite-dimensional nuisance parameters, focusing on applications such as heterogeneous treatment effect estimation across multiple data sources, including electronic health records from different hospitals or clinical trial data. Our framework is two-step: first, initial double machine-learning estimators are obtained through individual task learning; second, these estimators are adaptively aggregated to exploit task similarities while remaining robust to task-specific differences. In particular, the framework avoids individual level data sharing, preserving privacy. Additionally, we propose a novel multi-task learning method for nuisance parameter estimation, which further enhances parameter estimation when nuisance parameters exhibit similarity across tasks. We establish theoretical guarantees for the method, demonstrating faster convergence rates compared to individual task learning when tasks share similar parametric components. Extensive simulations and real data applications complement the theoretical findings of our work while highlight the effectiveness of our framework even in moderate sample sizes.