FederatedRSF : Federated Random Survival Forests for Partially Overlapping Medical Data

📅 2026-05-21
📈 Citations: 0
Influential: 0
📄 PDF

career value

204K/year
🤖 AI Summary
This study addresses the challenge of multicenter survival prediction, where patient-level clinical and genomic data cannot be shared due to privacy regulations and feature spaces across centers only partially overlap. The work proposes a federated random survival forest method that accommodates heterogeneous features by training survival trees locally and aggregating only those trees whose features are compatible across sites, thereby enabling collaborative modeling without exchanging raw data. Evaluated on simulated multicenter settings using the GBSG2 breast cancer dataset through repeated cross-validation, the proposed approach achieves performance comparable to centralized training as measured by Harrell’s C-index, effectively balancing privacy preservation with predictive efficacy.
📝 Abstract
Multi-center survival prediction can improve robustness and generalizability, yet privacy regulations and institutional governance often prevent pooling patient-level clinical and genomic data across institutions. In practice, deployment is further complicated by feature-space heterogeneity, in which sites collect different covariates or use different sequencing panels, resulting in only partially overlapping feature sets. We present FederatedRSF, a Python package that implements federated random survival forests, aggregating locally trained survival trees and redistributing only feature-compatible trees to each site, enabling inference with partial overlap without sharing raw data. We evaluate FederatedRSF on the GBSG2 breast cancer cohort distributed with the scikit-survival package, simulating feature heterogeneity across clients by withholding subsets of features, and assessing discrimination using Harrell's concordance index (C-Index) under repeated cross-validation and site-splits. The results demonstrated that the federated model can achieve performance comparable to that of the centralized training setting.
Problem

Research questions and friction points this paper is trying to address.

federated learning
survival prediction
feature heterogeneity
privacy-preserving
multi-center data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Learning
Random Survival Forests
Partially Overlapping Features
Privacy-Preserving
Multi-center Survival Prediction
🔎 Similar Papers
No similar papers found.
M
Maryam Moradpour
Institute for Predictive Deep Learning in Medicine and Healthcare, Justus Liebig University Gießen, Germany; Hessian Center for Artificial Intelligence (hessian.AI), Darmstadt, Germany
J
Jonas Harriehausen
Institute for Predictive Deep Learning in Medicine and Healthcare, Justus Liebig University Gießen, Germany; Hessian Center for Artificial Intelligence (hessian.AI), Darmstadt, Germany
A
Amirreza Aleyasin
Department of Medical Informatics, University Medical Center Göttingen, Germany
L
Lion Philipp Wolf
Department of Medical Informatics, University Medical Center Göttingen, Germany
Youngjun Park
Youngjun Park
Max Planck Institute for Biology of Ageing
Computational BiologyMachine learningBioinformatics
Anne-Christin Hauschild
Anne-Christin Hauschild
University Professor at Justus-Liebig University Gießen
Machine LearningExplainable AIBioinformaticsBiomedical Data ScienceBiostatistics