FEST: A Unified Framework for Evaluating Synthetic Tabular Data

📅 2025-08-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing synthetic tabular data evaluation methods lack a systematic framework that jointly addresses privacy preservation and data utility. This paper proposes FEST—the first unified, scalable synthetic data evaluation framework—designed to holistically quantify privacy–utility trade-offs across multiple dimensions. FEST uniquely integrates adversarial privacy metrics (e.g., membership inference, attribute inference) with distance-based privacy metrics (e.g., Jensen–Shannon divergence), while simultaneously assessing statistical fidelity (distributional similarity, correlation structure) and machine learning utility (downstream task performance). Built upon generative modeling principles, the framework is accompanied by an open-source Python library. Extensive validation on multiple benchmark datasets demonstrates its effectiveness and robustness. FEST enables standardized, cross-model comparative analysis of privacy–utility trade-offs, thereby providing a principled, reproducible tool for trustworthy synthetic data evaluation.

Technology Category

Application Category

📝 Abstract
Synthetic data generation, leveraging generative machine learning techniques, offers a promising approach to mitigating privacy concerns associated with real-world data usage. Synthetic data closely resembles real-world data while maintaining strong privacy guarantees. However, a comprehensive assessment framework is still missing in the evaluation of synthetic data generation, especially when considering the balance between privacy preservation and data utility in synthetic data. This research bridges this gap by proposing FEST, a systematic framework for evaluating synthetic tabular data. FEST integrates diverse privacy metrics (attack-based and distance-based), along with similarity and machine learning utility metrics, to provide a holistic assessment. We develop FEST as an open-source Python-based library and validate it on multiple datasets, demonstrating its effectiveness in analyzing the privacy-utility trade-off of different synthetic data generation models. The source code of FEST is available on Github.
Problem

Research questions and friction points this paper is trying to address.

Lacks comprehensive framework for synthetic data evaluation
Balancing privacy preservation with data utility challenges
Assessing privacy-utility trade-off in synthetic tabular data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified framework for synthetic tabular data evaluation
Integrates privacy metrics with utility assessment metrics
Open-source Python library analyzing privacy-utility tradeoff
🔎 Similar Papers
No similar papers found.
W
Weijie Niu
Communication Systems Group CSG, Department of Informatics, University of Zurich UZH, CH–8050 Zurich, Switzerland
Alberto Huertas Celdran
Alberto Huertas Celdran
University of Murcia
CybersecurityBrain-Computer InterfacesFederated LearningTrusted AI
K
Karoline Siarsky
Communication Systems Group CSG, Department of Informatics, University of Zurich UZH, CH–8050 Zurich, Switzerland
B
Burkhard Stiller
Communication Systems Group CSG, Department of Informatics, University of Zurich UZH, CH–8050 Zurich, Switzerland