A Consensus Privacy Metrics Framework for Synthetic Data

📅 2025-03-06
📈 Citations: 0
✹ Influential: 0
📄 PDF
đŸ€– AI Summary
Current privacy assessments of synthetic data lack standardized, quantifiable metrics—particularly for identity, membership, and attribute disclosure risks. Method: We introduce the first expert-consensus–driven privacy measurement framework, employing the Delphi method to systematically define the conceptual boundaries of these three risk categories; we empirically expose the fundamental inadequacy of prevailing similarity-based metrics in privacy evaluation and formally refute the interpretability of non-negligible differential privacy budgets. Contribution/Results: The framework yields an actionable, taxonomy-aware privacy metric recommendation checklist, explicitly identifying critical research gaps. It establishes a rigorous, reproducible, and fine-grained theoretical foundation and practical toolkit for regulatory-compliant privacy assessment of synthetic data.

Technology Category

Application Category

📝 Abstract
Synthetic data generation is one approach for sharing individual-level data. However, to meet legislative requirements, it is necessary to demonstrate that the individuals' privacy is adequately protected. There is no consolidated standard for measuring privacy in synthetic data. Through an expert panel and consensus process, we developed a framework for evaluating privacy in synthetic data. Our findings indicate that current similarity metrics fail to measure identity disclosure, and their use is discouraged. For differentially private synthetic data, a privacy budget other than close to zero was not considered interpretable. There was consensus on the importance of membership and attribute disclosure, both of which involve inferring personal information about an individual without necessarily revealing their identity. The resultant framework provides precise recommendations for metrics that address these types of disclosures effectively. Our findings further present specific opportunities for future research that can help with widespread adoption of synthetic data.
Problem

Research questions and friction points this paper is trying to address.

Lack of standardized privacy metrics for synthetic data.
Current metrics fail to measure identity disclosure effectively.
Need for precise metrics addressing membership and attribute disclosure.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed consensus framework for privacy metrics
Identified limitations in current similarity metrics
Focused on membership and attribute disclosure risks
🔎 Similar Papers
No similar papers found.
L
Lisa Pilgram
School of Epidemiology and Public Health, University of Ottawa, Ontario, Canada; CHEO Research Institute, Ontario, Canada; Department of Nephrology and Medical Intensive Care, CharitĂ© – UniversitĂ€tsmedizin Berlin, Berlin, Germany
F
F. Dankar
CHEO Research Institute, Ontario, Canada
J
Jorg Drechsler
Department for Statistical Methods, Institute for Employment Research, Nuernberg, Germany; Institute for Statistics, Ludwig-Maximilians-UniversitÀt, Munich, Germany; Joint Program in Survey Methodology, University of Maryland, USA
M
Mark Elliot
The Cathie Marsh Institute Research, School of Social Sciences, University of Manchester, Manchester, United Kingdom
Josep Domingo-Ferrer
Josep Domingo-Ferrer
Distinguished Full Professor, Universitat Rovira i Virgili, Director-CYBERCAT, FIEEE, ACM DS
Data protectionPrivacyCybersecurityMachine learningStatistical Disclosure Control
P
Paul Francis
Max Planck Institute for Software Systems, Germany
M
Murat Kantarcıoǧlu
Department of Computer Science, Virginia Tech, USA
Linglong Kong
Linglong Kong
Professor, Canada Research Chair in Statistical Learning, UAlberta, and Canada CIFAR AI Chair, Amii
Functional and Neuroimaging Data AnalysisRobust Statistics and Quantile Regressionand Statistical Machine Learning
B
B. Malin
Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA; Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA; Department of Computer Science, Vanderbilt University, Nashville, Tennessee, USA
K
Krishnamurty Muralidhar
Department of Marketing and Supply Chain Management, University of Oklahoma, Oklahoma, USA
P
Puja Myles
Medicines and Healthcare products Regulatory Agency, London, UK
Fabian Prasser
Fabian Prasser
Berlin Institute of Health @ Charité - UniversitÀtsmedizin Berlin
Medical InformaticsInformation IntegrationData Privacy
J
J. Raisaro
Biomedical Data Science Center, University Hospital Lausanne, Lausanne, Switzerland
Chao Yan
Chao Yan
Instructor at DBMI, VUMC; CS PhD from Vanderbilt U
AI for medicineSynthetic health dataPrivacyFairness
K
K. E. Emam
School of Epidemiology and Public Health, University of Ottawa, Ontario, Canada; CHEO Research Institute, Ontario, Canada