Federated Learning with Incomplete Data: When to Use Complete Cases and When to Weight

📅 2026-05-19
📈 Citations: 0
Influential: 0
📄 PDF

career value

224K/year
🤖 AI Summary
This study addresses estimation bias in federated learning caused by missing data by proposing a federated analysis framework that integrates cross-site calibration weights. When complete-case (CC) and inverse probability weighting (IPW) estimators fail, the method ensures global estimator consistency under site-level compatibility conditions, provided that at least one local weight model is correctly specified. By incorporating sandwich variance estimation, the framework demonstrates both theoretical robustness and empirical reliability. The approach was successfully applied to identify risk factors for 90-day mortality among patients with pleural infection, confirming its effectiveness and practical utility in real-world healthcare settings.
📝 Abstract
Privacy constraints have driven the rise of federated learning (FL), which enables multi-site analyses without sharing individual participant data. We develop a framework for FL with missing data, identifying conditions under which the complete case (CC) estimator is preferred over the inverse probability weighting (IPW) estimator. For settings where the CC estimator fails, we introduce a calibrated weight estimation approach that combines candidate weighting models across sites and remains consistent if at least one is correctly specified. Consistency conditions are stated at the site level, ensuring that the federated estimator inherits validity from local properties. We derive a sandwich variance estimator that accounts for uncertainty in weight estimation, and illustrate the framework by evaluating risk factors for 90-day mortality among patients with pleural infections treated with intrapleural enzyme therapy.
Problem

Research questions and friction points this paper is trying to address.

Federated Learning
Missing Data
Complete Case
Inverse Probability Weighting
Data Privacy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Learning
Missing Data
Complete Case Estimator
Inverse Probability Weighting
Calibrated Weighting
J
Jesus E. Vazquez
Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21231
Y
Yicheng Shen
Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21231
Jason Akulian
Jason Akulian
UNC Chapel Hill
Interventional PulmonaryLung cancerPleural Disease
C
Chad Hochberg
Pulmonary and Critical Care Medicine, Johns Hopkins University, Baltimore, MD 21231
T
Theodore J. Iwashyna
Pulmonary and Critical Care Medicine, Johns Hopkins University, Baltimore, MD 21231
E
Elizabeth A. Stuart
Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21231
Jiayi Tong
Jiayi Tong
Assistant Professor, Department of Biostatistics, Johns Hopkins University
BiostatisticsBiomedical informaticsReal-world evidence (RWE)Meta-analysis