🤖 AI Summary
Existing tabular data synthesis methods lack a unified, comparable evaluation framework due to fragmented metrics and missing standardized benchmarks—particularly concerning privacy guarantees (differential privacy vs. heuristic approaches) and model paradigms (diffusion models, LLMs vs. statistical methods).
Method: We introduce the first systematic evaluation framework for privacy-preserving tabular synthesis, featuring a three-dimensional quantitative metric system—fidelity, privacy, and utility—and a differentiable, unified objective function enabling fair cross-paradigm comparison across diffusion models, LLM-based synthesizers, and marginal-distribution methods. The framework integrates formal differential privacy verification, multi-scale statistical utility assessment (e.g., MMD, JS divergence), adversarial privacy attack benchmarks, and downstream task generalization tests.
Results: Extensive experiments across 12 real-world datasets and 8 synthesizers reveal fundamental performance boundaries and trade-off patterns, providing empirical guidance and concrete improvement pathways for next-generation privacy-enhanced synthetic data generation.
📝 Abstract
Data synthesis has been advocated as an important approach for utilizing data while protecting data privacy. A large number of tabular data synthesis algorithms (which we call synthesizers) have been proposed. Some synthesizers satisfy Differential Privacy, while others aim to provide privacy in a heuristic fashion. A comprehensive understanding of the strengths and weaknesses of these synthesizers remains elusive due to drawbacks in evaluation metrics and missing head-to-head comparisons of newly developed synthesizers that take advantage of diffusion models and large language models with state-of-the-art marginal-based synthesizers. In this paper, we present a systematic evaluation framework for assessing tabular data synthesis algorithms. Specifically, we examine and critique existing evaluation metrics, and introduce a set of new metrics in terms of fidelity, privacy, and utility to address their limitations. Based on the proposed metrics, we also devise a unified objective for tuning, which can consistently improve the quality of synthetic data for all methods. We conducted extensive evaluations of 8 different types of synthesizers on 12 real-world datasets and identified some interesting findings, which offer new directions for privacy-preserving data synthesis.