FedCVD++: Communication-Efficient Federated Learning for Cardiovascular Risk Prediction with Parametric and Non-Parametric Model Optimization

📅 2025-07-30

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This study addresses key challenges in applying federated learning (FL) to cardiovascular disease risk prediction: stringent privacy requirements, high communication overhead, and severe inter-institutional class imbalance. To this end, it pioneers the efficient integration of nonparametric models—specifically random forests and XGBoost—into a medical FL framework. Three core innovations are proposed: (1) tree subset sampling to drastically reduce model transmission costs; (2) lightweight, XGBoost-based feature extraction enabling effective cross-institutional knowledge transfer; and (3) a synchronized federated SMOTE mechanism to mitigate local data imbalance. Evaluated on the Framingham Heart Study dataset, the federated XGBoost achieves an F1-score of 0.80—surpassing centralized training—while federated random forest attains 0.81, matching local training performance. Communication overhead is reduced by 3.2×, accuracy remains at 95%, and F1 improves by up to 15%. This work establishes a new paradigm for privacy-preserving, efficient, and scalable distributed medical prediction.

Technology Category

Application Category

📝 Abstract

Cardiovascular diseases (CVD) cause over 17 million deaths annually worldwide, highlighting the urgent need for privacy-preserving predictive systems. We introduce FedCVD++, an enhanced federated learning (FL) framework that integrates both parametric models (logistic regression, SVM, neural networks) and non-parametric models (Random Forest, XGBoost) for coronary heart disease risk prediction. To address key FL challenges, we propose: (1) tree-subset sampling that reduces Random Forest communication overhead by 70%, (2) XGBoost-based feature extraction enabling lightweight federated ensembles, and (3) federated SMOTE synchronization for resolving cross-institutional class imbalance. Evaluated on the Framingham dataset (4,238 records), FedCVD++ achieves state-of-the-art results: federated XGBoost (F1 = 0.80) surpasses its centralized counterpart (F1 = 0.78), and federated Random Forest (F1 = 0.81) matches non-federated performance. Additionally, our communication-efficient strategies reduce bandwidth consumption by 3.2X while preserving 95% accuracy. Compared to existing FL frameworks, FedCVD++ delivers up to 15% higher F1-scores and superior scalability for multi-institutional deployment. This work represents the first practical integration of non-parametric models into federated healthcare systems, providing a privacy-preserving solution validated under real-world clinical constraints.

Problem

Research questions and friction points this paper is trying to address.

Enhancing federated learning for CVD risk prediction with mixed models

Reducing communication overhead in federated Random Forest by 70%

Addressing class imbalance in cross-institutional federated learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tree-subset sampling reduces communication overhead

XGBoost-based feature extraction enables lightweight ensembles

Federated SMOTE synchronization resolves class imbalance

🔎 Similar Papers

No similar papers found.