🤖 AI Summary
This study addresses a critical gap in synthetic data generation (SDG) research, which has predominantly focused on privacy attacks initiated by data recipients while overlooking internal adversaries—such as data owners or generators—who may degrade data quality by perturbing real data. The work formally introduces this internal threat model and proposes targeted perturbation strategies based on label flipping and feature importance manipulation. Through systematic evaluation across multiple mainstream SDG frameworks, the experiments demonstrate that even minimal perturbations can substantially impair downstream task performance and amplify statistical distributional biases. These findings reveal a pronounced vulnerability in current SDG pipelines regarding data integrity and underscore the urgent need for robustness and integrity verification mechanisms in synthetic data workflows.
📝 Abstract
Synthetic Data Generation (SDG) can be used to facilitate privacy-preserving data sharing. However, most existing research focuses on privacy attacks where the adversary is the recipient of the released synthetic data and attempts to infer sensitive information from it. This study investigates quality degradation attacks initiated by adversaries who possess access to the real dataset or control over the generation process, such as the data owner, the synthetic data provider, or potential intruders. We formalize a corresponding threat model and empirically evaluate the effectiveness of targeted manipulations of real data (e.g., label flipping and feature-importance-based interventions) on the quality of generated synthetic data. The results show that even small perturbations can substantially reduce downstream predictive performance and increase statistical divergence, exposing vulnerabilities within SDG pipelines. This study highlights the need to integrate integrity verification and robustness mechanisms, alongside privacy protection, to ensure the reliability and trustworthiness of synthetic data sharing frameworks.