SynQP: A Framework and Metrics for Evaluating the Quality and Privacy Risk of Synthetic Data

📅 2025-08-26
🏛️ Conference on Privacy, Security and Trust
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the lack of open frameworks and benchmarks for evaluating privacy risks in synthetic health data, which hinders its safe deployment. To this end, we propose SynQP, the first open-source framework enabling systematic benchmarking of both utility and privacy risks of synthetic data without access to real sensitive records, using simulated sensitive data instead. We introduce a more equitable metric for identity disclosure risk and conduct a comprehensive evaluation integrating differential privacy (DP), CTGAN generative models, membership inference attacks (MIA), and identity disclosure risk (IDR). Experimental results demonstrate that non-private models achieve near-perfect utility (≥0.97), while DP-enhanced models consistently reduce both identity disclosure and MIA risks below the regulatory threshold of 0.09.

Technology Category

Application Category

📝 Abstract
The use of synthetic data in health applications raises privacy concerns, yet the lack of open frameworks for privacy evaluations has slowed its adoption. A major challenge is the absence of accessible benchmark datasets for evaluating privacy risks, due to difficulties in acquiring sensitive data. To address this, we introduce SynQP, an open framework for benchmarking privacy in synthetic data generation (SDG) using simulated sensitive data, ensuring that original data remains confidential. We also highlight the need for privacy metrics that fairly account for the probabilistic nature of machine learning models. As a demonstration, we use SynQPto benchmark CTGAN and propose a new identity disclosure risk metric that offers a more accurate estimation of privacy risks compared to existing approaches. Our work provides a critical tool for improving the transparency and reliability of privacy evaluations, enabling safer use of synthetic data in health-related applications. Our privacy assessments (Table II) reveal that DP consistently lowers both identity disclosure risk (SD-IDR) and membershipinference attack risk (SD-MIA), with all DP-augmented models staying below the 0.09 regulatory threshold.Code available at https://github.com/CAN-SYNH/SynQP
Problem

Research questions and friction points this paper is trying to address.

synthetic data
privacy risk
benchmarking
health applications
privacy metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthetic Data
Privacy Risk Evaluation
Benchmarking Framework
Identity Disclosure Risk
Differential Privacy
🔎 Similar Papers
No similar papers found.
Bing Hu
Bing Hu
Unknown affiliation
Machine LearningData MiningStatistics
Yixin Li
Yixin Li
Stony Brook University
PET InstrumentMedical ImagingX-ray Imaging
A
Asma Bahamyirou
Public Health Agency of Canada, Government of Canada
H
Helen Chen
Public Health Sciences, University of Waterloo