🤖 AI Summary
Estimating heterogeneous treatment effects (HTE) under multi-source data is challenging due to population heterogeneity and privacy constraints that hinder data pooling. Method: We propose the first doubly robust framework integrating targeted learning with federated learning. It employs a projection-based estimator to harmonize distributed datasets, corrects for covariate distribution shift while preserving privacy, and adaptively identifies non-transferable data sources. The framework supports both continuous and binary outcomes. Contribution/Results: We introduce a communication-efficient Bootstrap selection algorithm to enhance cross-site generalizability. Simulation studies demonstrate substantial improvements over state-of-the-art methods. Applied to nationally linked U.S. Medicare data, our approach successfully uncovers clinically meaningful HTE patterns in elderly cohorts, validating its practical utility, scalability, and robustness in real-world federated healthcare settings.
📝 Abstract
Analyzing data from multiple sources offers valuable opportunities to improve the estimation efficiency of causal estimands. However, this analysis also poses many challenges due to population heterogeneity and data privacy constraints. While several advanced methods for causal inference in federated settings have been developed in recent years, many focus on difference-based averaged causal effects and are not designed to study effect modification. In this study, we introduce a novel targeted-federated learning framework to study the heterogeneity of treatment effects (HTEs) for a targeted population by proposing a projection-based estimand. This HTE framework integrates information from multiple data sources without sharing raw data, while accounting for covariate distribution shifts among sources. Our proposed approach is shown to be doubly robust, conveniently supporting both difference-based estimands for continuous outcomes and odds ratio-based estimands for binary outcomes. Furthermore, we develop a communication-efficient bootstrap-based selection procedure to detect non-transportable data sources, thereby enhancing robust information aggregation without introducing bias. The superior performance of the proposed estimator over existing methods is demonstrated through extensive simulation studies, and the utility of our approach has been shown in a real-world data application using nationwide Medicare-linked data.