🤖 AI Summary
Online medical prediction platforms face dual challenges—patient attribute leakage and physician model exposure—leading to degraded predictive performance. Method: This paper proposes the first distributed learning framework that jointly safeguards patient attribute privacy and physician model security. It formally defines and distinguishes two types of privacy attacks (patient-targeted and physician-targeted), and introduces a one-shot distributed learning mechanism. Grounded in statistical learning theory, the framework integrates differential privacy, federated learning, and secure aggregation—requiring no sharing of raw data or model parameters. Contribution/Results: Theoretically, it proves optimality of predictive performance under a given privacy budget. Empirical evaluation on synthetic and real-world healthcare datasets demonstrates robust resistance to both privacy threats, achieves prediction accuracy comparable to centralized training, and significantly enhances platform utility and security.
📝 Abstract
Online collaborative medical prediction platforms offer convenience and real-time feedback by leveraging massive electronic health records. However, growing concerns about privacy and low prediction quality can deter patient participation and doctor cooperation. In this paper, we first clarify the privacy attacks, namely attribute attacks targeting patients and model extraction attacks targeting doctors, and specify the corresponding privacy principles. We then propose a privacy-preserving mechanism and integrate it into a novel one-shot distributed learning framework, aiming to simultaneously meet both privacy requirements and prediction performance objectives. Within the framework of statistical learning theory, we theoretically demonstrate that the proposed distributed learning framework can achieve the optimal prediction performance under specific privacy requirements. We further validate the developed privacy-preserving collaborative medical prediction platform through both toy simulations and real-world data experiments.