A Review of Privacy Metrics for Privacy-Preserving Synthetic Data Generation

📅 2025-07-15

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This paper addresses three core challenges in privacy-preserving synthetic data generation (PP-SDG): opaque privacy loss interpretation, non-transparent risk semantics of the differential privacy parameter ε, and ambiguous, incomparable definitions across diverse privacy metrics (PMs). To resolve these, we systematically survey and, for the first time, formally unify the mathematical definitions of 17 mainstream PMs—explicitly specifying their underlying assumptions, implicit premises, and analytical expressions. Grounded in differential privacy theory, we integrate information-theoretic and statistical inference principles to analyze each PM’s computational model and applicability boundaries. Based on this analysis, we propose the first comprehensive PM taxonomy, rigorously characterized along three dimensions: completeness, consistency, and interpretability. This taxonomy substantially enhances transparency and standardization in privacy risk assessment and provides both a rigorous theoretical foundation and a practical evaluation framework for privacy–utility trade-offs in PP-SDG mechanisms.

Technology Category

Application Category

📝 Abstract

Privacy Preserving Synthetic Data Generation (PP-SDG) has emerged to produce synthetic datasets from personal data while maintaining privacy and utility. Differential privacy (DP) is the property of a PP-SDG mechanism that establishes how protected individuals are when sharing their sensitive data. It is however difficult to interpret the privacy loss ($varepsilon$) expressed by DP. To make the actual risk associated with the privacy loss more transparent, multiple privacy metrics (PMs) have been proposed to assess the privacy risk of the data. These PMs are utilized in separate studies to assess newly introduced PP-SDG mechanisms. Consequently, these PMs embody the same assumptions as the PP-SDG mechanism they were made to assess. Therefore, a thorough definition of how these are calculated is necessary. In this work, we present the assumptions and mathematical formulations of 17 distinct privacy metrics.

Problem

Research questions and friction points this paper is trying to address.

Interpreting privacy loss in differential privacy mechanisms

Assessing privacy risk with multiple privacy metrics

Defining calculations for 17 distinct privacy metrics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses differential privacy for data protection

Introduces multiple privacy risk metrics

Reviews 17 distinct privacy metric formulations

🔎 Similar Papers

A Multi-Faceted Evaluation Framework for Assessing Synthetic Data Generated by Large Language Models