Embedding-Space Data Augmentation to Prevent Membership Inference Attacks in Clinical Time Series Forecasting

📅 2025-11-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the privacy–utility trade-off in clinical time-series forecasting by proposing an embedding-space data augmentation method to defend against membership inference attacks (MIAs). The core contribution is ZOO-PCA—a novel data augmentation strategy that integrates zeroth-order optimization (ZOO) with principal component analysis (PCA) constraints to generate semantically preserved, distributionally consistent adversarial samples directly in the model’s embedding layer; this is further enhanced via MixUp. Unlike conventional approaches, ZOO-PCA preserves predictive accuracy while substantially reducing the true-positive rate to false-positive rate (TPR/FPR) ratio of MIAs. Empirical evaluations demonstrate that ZOO-PCA outperforms baseline methods—including standard ZOO and MixUp—achieving superior privacy protection without compromising generalization. The method thus establishes a new paradigm for trustworthy deployment of healthcare time-series models, effectively balancing rigorous privacy guarantees with high predictive fidelity.

Technology Category

Application Category

📝 Abstract
Balancing strong privacy guarantees with high predictive performance is critical for time series forecasting (TSF) tasks involving Electronic Health Records (EHR). In this study, we explore how data augmentation can mitigate Membership Inference Attacks (MIA) on TSF models. We show that retraining with synthetic data can substantially reduce the effectiveness of loss-based MIAs by reducing the attacker's true-positive to false-positive ratio. The key challenge is generating synthetic samples that closely resemble the original training data to confuse the attacker, while also introducing enough novelty to enhance the model's ability to generalize to unseen data. We examine multiple augmentation strategies - Zeroth-Order Optimization (ZOO), a variant of ZOO constrained by Principal Component Analysis (ZOO-PCA), and MixUp - to strengthen model resilience without sacrificing accuracy. Our experimental results show that ZOO-PCA yields the best reductions in TPR/FPR ratio for MIA attacks without sacrificing performance on test data.
Problem

Research questions and friction points this paper is trying to address.

Preventing membership inference attacks in clinical time series forecasting
Balancing privacy guarantees with predictive performance in EHR models
Generating synthetic data that confuses attackers while maintaining generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Embedding-space data augmentation prevents membership inference attacks
ZOO-PCA method reduces attack effectiveness while maintaining accuracy
Synthetic data generation balances privacy protection and model generalization
🔎 Similar Papers
No similar papers found.
M
Marius Fracarolli
Department of Computational Linguistics, Heidelberg University, Germany
M
Michael Staniek
Department of Computational Linguistics, Heidelberg University, Germany
Stefan Riezler
Stefan Riezler
Professor for Statistical Natural Language Processing, Heidelberg University
Natural Language ProcessingMachine LearningArtificial Intelligence