Distributed Kaplan-Meier Analysis via the Influence Function with Application to COVID-19 and COVID-19 Vaccine Adverse Events

📅 2025-07-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
During the COVID-19 pandemic, multicenter studies of rare thromboembolic events were constrained by privacy regulations prohibiting individual-level data sharing. To address this, we propose a distributed Kaplan–Meier estimator based on influence functions—enabling privacy-preserving survival analysis without pooling raw data. Each site shares only low-dimensional intermediate statistics, supporting sequential updates and distributed inference while maintaining statistical efficiency. Integrated with inverse probability weighting to adjust for confounding, the method achieves statistical power comparable to centralized analysis in simulations. Empirical analysis estimates post-infection thrombosis incidence at 3.13%, significantly exceeding the 0.08% incidence after first-dose vaccination (p < 0.001), robustly confirming net vaccine benefit. This framework establishes a scalable, verifiable paradigm for multicenter observational studies under stringent privacy constraints.

Technology Category

Application Category

📝 Abstract
During the COVID-19 pandemic, regulatory decision-making was hampered by a lack of timely and high-quality data on rare outcomes. Studying rare outcomes following infection and vaccination requires conducting multi-center observational studies, where sharing individual-level data is a privacy concern. In this paper, we conduct a multi-center observational study of thromboembolic events following COVID-19 and COVID-19 vaccination without sharing individual-level data. We accomplish this by developing a novel distributed learning method for constructing Kaplan-Meier (KM) curves and inverse propensity weighted KM curves with statistical inference. We sequentially update curves site-by-site using the KM influence function, which is a measure of the direction in which an observation should shift our estimate and so can be used to incorporate new observations without access to previous data. We show in simulations that our distributed estimator is unbiased and achieves equal efficiency to the combined data estimator. Applying our method to Beaumont Health, Spectrum Health, and Michigan Medicine data, we find a much higher covariate-adjusted incidence of blood clots after SARS-CoV-2 infection (3.13%, 95% CI: [2.93, 3.35]) compared to first COVID-19 vaccine (0.08%, 95% CI: [0.08, 0.09]). This suggests that the protection vaccines provide against COVID-19-related clots outweighs the risk of vaccine-related adverse events, and shows the potential of distributed survival analysis to provide actionable evidence for time-sensitive decision making.
Problem

Research questions and friction points this paper is trying to address.

Develop distributed Kaplan-Meier analysis for rare outcomes
Enable multi-center studies without sharing individual data
Assess thromboembolic risks post-COVID and vaccination
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributed Kaplan-Meier analysis via influence function
Multi-center study without sharing individual-level data
Unbiased estimator with combined data efficiency
🔎 Similar Papers
No similar papers found.