Targeted Data Fusion for Causal Survival Analysis Under Distribution Shift

📅 2025-01-30
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses key challenges in causal survival analysis under multi-source heterogeneous distributions—including right-censoring, mixed discrete/continuous-time confounding, and cross-site distributional shifts—by proposing the first privacy-preserving federated causal survival inference framework. Methodologically, it integrates semi-parametric efficient estimators with a dynamic importance-weighting mechanism to adaptively correct source-site distribution shifts, while leveraging nonparametric machine learning models (e.g., neural survival models, gradient-boosted survival trees) to capture complex event-risk functions. Its key contributions are: (i) the first systematic solution to causal fusion of multi-source censored time-to-event data, and (ii) support for target-site-specific causal effect estimation. Evaluated on synthetic data and multi-national HIV-1 prevention clinical trials, the framework significantly improves estimation accuracy and robustness of causal effects, enabling privacy-safe, generalizable causal survival inference across diverse populations and geographic regions.

Technology Category

Application Category

📝 Abstract
Causal inference across multiple data sources offers a promising avenue to enhance the generalizability and replicability of scientific findings. However, data integration methods for time-to-event outcomes, common in biomedical research, are underdeveloped. Existing approaches focus on binary or continuous outcomes but fail to address the unique challenges of survival analysis, such as censoring and the integration of discrete and continuous time. To bridge this gap, we propose two novel methods for estimating target site-specific causal effects in multi-source settings. First, we develop a semiparametric efficient estimator for settings where individual-level data can be shared across sites. Second, we introduce a federated learning framework designed for privacy-constrained environments, which dynamically reweights source-specific contributions to account for discrepancies with the target population. Both methods leverage flexible, nonparametric machine learning models to improve robustness and efficiency. We illustrate the utility of our approaches through simulation studies and an application to multi-site randomized trials of monoclonal neutralizing antibodies for HIV-1 prevention, conducted among cisgender men and transgender persons in the United States, Brazil, Peru, and Switzerland, as well as among women in sub-Saharan Africa. Our findings underscore the potential of these methods to enable efficient, privacy-preserving causal inference for time-to-event outcomes under distribution shift.
Problem

Research questions and friction points this paper is trying to address.

Address causal survival analysis challenges under distribution shift
Develop methods for multi-source data with privacy constraints
Improve robustness in time-to-event outcome integration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semiparametric efficient estimator for shared individual-level data
Federated learning framework for privacy-constrained environments
Nonparametric machine learning models for robustness and efficiency
🔎 Similar Papers
Y
Yi Liu
Department of Statistics, North Carolina State University, Raleigh, NC, USA; Duke Clinical Research Institute, Durham, NC, USA
A
Alexander Levis
Carnegie Mellon University, Department of Statistics, Pittsburgh, PA, USA
K
Ke Zhu
Department of Statistics, North Carolina State University, Raleigh, NC, USA; Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA
S
Shu Yang
Department of Statistics, North Carolina State University, Raleigh, NC, USA
Peter B. Gilbert
Peter B. Gilbert
Member, Fred Hutchinson Cancer Center, and Department of Biostatistics, University of Washington
Biostatisticsvaccine clinical trials
Larry Han
Larry Han
Assistant Professor of Public Health and Health Sciences, Northeastern University
Causal InferenceFederated LearningSurvival AnalysisInfectious Diseases