Generative clinical time series models trained on moderate amounts of patient data are privacy preserving

📅 2026-02-11

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This study systematically evaluates the intrinsic privacy properties of state-of-the-art generative models for clinical time-series data under moderate-scale training, without relying on additional privacy-preserving mechanisms. Using the MIMIC-IV dataset, we train representative generative models and conduct comprehensive privacy audits through multiple attack vectors, including membership inference and attribute inference attacks, while further assessing cross-dataset generalization of these attacks using the eICU dataset. Our findings demonstrate that, given sufficient training data, current inference attacks largely fail against generated multivariate time-series data, indicating that such models inherently offer strong privacy protection. Moreover, incorporating explicit privacy mechanisms like differential privacy yields negligible privacy gains while significantly degrading data utility, suggesting that generative modeling alone may suffice for practical privacy preservation in this domain.

Technology Category

Application Category

📝 Abstract

Sharing medical data for machine learning model training purposes is often impossible due to the risk of disclosing identifying information about individual patients. Synthetic data produced by generative artificial intelligence (genAI) models trained on real data is often seen as one possible solution to comply with privacy regulations. While powerful genAI models for heterogeneous hospital time series have recently been introduced, such modeling does not guarantee privacy protection, as the generated data may still reveal identifying information about individuals in the models'training cohort. Applying established privacy mechanisms to generative time series models, however, proves challenging as post-hoc data anonymization through k-anonymization or similar techniques is limited, while model-centered privacy mechanisms that implement differential privacy (DP) may lead to unstable training, compromising the utility of generated data. Given these known limitations, privacy audits for generative time series models are currently indispensable regardless of the concrete privacy mechanisms applied to models and/or data. In this work, we use a battery of established privacy attacks to audit state-of-the-art hospital time series models, trained on the public MIMIC-IV dataset, with respect to privacy preservation. Furthermore, the eICU dataset was used to mount a privacy attack against the synthetic data generator trained on the MIMIC-IV dataset. Results show that established privacy attacks are ineffective against generated multivariate clinical time series when synthetic data generators are trained on large enough training datasets. Furthermore, we discuss how the use of existing DP mechanisms for these synthetic data generators would not bring desired improvement in privacy, but only a decrease in utility for machine learning prediction tasks.

Problem

Research questions and friction points this paper is trying to address.

privacy preservation

generative models

clinical time series

synthetic data

privacy attacks

Innovation

Methods, ideas, or system contributions that make the work stand out.

generative AI

clinical time series

privacy auditing