Federated Learning with Incomplete Data: When to Use Complete Cases and When to Weight

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This study addresses estimation bias in federated learning caused by missing data by proposing a federated analysis framework that integrates cross-site calibration weights. When complete-case (CC) and inverse probability weighting (IPW) estimators fail, the method ensures global estimator consistency under site-level compatibility conditions, provided that at least one local weight model is correctly specified. By incorporating sandwich variance estimation, the framework demonstrates both theoretical robustness and empirical reliability. The approach was successfully applied to identify risk factors for 90-day mortality among patients with pleural infection, confirming its effectiveness and practical utility in real-world healthcare settings.

📝 Abstract

Privacy constraints have driven the rise of federated learning (FL), which enables multi-site analyses without sharing individual participant data. We develop a framework for FL with missing data, identifying conditions under which the complete case (CC) estimator is preferred over the inverse probability weighting (IPW) estimator. For settings where the CC estimator fails, we introduce a calibrated weight estimation approach that combines candidate weighting models across sites and remains consistent if at least one is correctly specified. Consistency conditions are stated at the site level, ensuring that the federated estimator inherits validity from local properties. We derive a sandwich variance estimator that accounts for uncertainty in weight estimation, and illustrate the framework by evaluating risk factors for 90-day mortality among patients with pleural infections treated with intrapleural enzyme therapy.

Problem

Research questions and friction points this paper is trying to address.

Federated Learning

Missing Data

Complete Case

Inverse Probability Weighting

Data Privacy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Learning

Missing Data

Complete Case Estimator