Bayesian Federated Cause-of-Death Classification and Quantification Under Distribution Shift

📅 2025-05-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In regions lacking medical certification of death, verbal autopsy (VA) is critical for inferring causes of death; however, existing VA algorithms suffer from distributional shift and rely on centralized training, violating privacy requirements and logistical constraints. This paper proposes the first Bayesian federated learning framework for VA, enabling cross-site collaborative modeling without sharing raw individual-level data, while jointly optimizing both individual-level cause-of-death classification and population-level cause-specific mortality fraction (CSMF) estimation. The framework integrates Bayesian inference, distributionally robust optimization, and probabilistic quantization, with a modular design compatible with mainstream VA algorithms. Evaluated on two real-world VA datasets, it significantly outperforms single-site baselines in both individual prediction accuracy and CSMF estimation, matching or exceeding the performance of centralized joint modeling—demonstrating strong generalizability and adaptability under low-resource, privacy-sensitive settings.

Technology Category

Application Category

📝 Abstract
In regions lacking medically certified causes of death, verbal autopsy (VA) is a critical and widely used tool to ascertain the cause of death through interviews with caregivers. Data collected by VAs are often analyzed using probabilistic algorithms. The performance of these algorithms often degrades due to distributional shift across populations. Most existing VA algorithms rely on centralized training, requiring full access to training data for joint modeling. This is often infeasible due to privacy and logistical constraints. In this paper, we propose a novel Bayesian Federated Learning (BFL) framework that avoids data sharing across multiple training sources. Our method enables reliable individual-level cause-of-death classification and population-level quantification of cause-specific mortality fractions (CSMFs), in a target domain with limited or no local labeled data. The proposed framework is modular, computationally efficient, and compatible with a wide range of existing VA algorithms as candidate models, facilitating flexible deployment in real-world mortality surveillance systems. We validate the performance of BFL through extensive experiments on two real-world VA datasets under varying levels of distribution shift. Our results show that BFL significantly outperforms the base models built on a single domain and achieves comparable or better performance compared to joint modeling.
Problem

Research questions and friction points this paper is trying to address.

Classify cause-of-death under distribution shift
Quantify mortality fractions without local data
Enable federated learning for privacy-preserving VA analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian Federated Learning avoids data sharing
Modular framework compatible with VA algorithms
Reliable classification and quantification under distribution shift
🔎 Similar Papers
No similar papers found.
Y
Yu Zhu
Department of Statistics, University of California, Santa Cruz
Zehang Richard Li
Zehang Richard Li
University of California, Santa Cruz
statisticsbiostatisticsdemography