On Privacy Leakage in Tabular Diffusion Models: Influential Factors, Attacker Knowledge, and Metrics

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This study systematically evaluates the privacy leakage risks of tabular diffusion models in synthetic data generation. Through membership inference attacks under both black-box and white-box settings, it investigates how training configurations, synthesis strategies, and the adversary’s prior knowledge jointly influence privacy vulnerability. The findings demonstrate that attackers can mount effective inference attacks without precise knowledge of training details, access to in-distribution data, or substantial computational resources. Moreover, the work reveals significant limitations of commonly used heuristic privacy metrics—such as nearest-neighbor distance—in reliably capturing actual privacy risks. To the best of our knowledge, this is the first empirical analysis on real-world tabular datasets that quantifies the contributions of multiple factors to privacy leakage, offering crucial insights and practical guidance for designing privacy-preserving synthetic data generation systems.

📝 Abstract

Tabular data plays an important role in many fields and industries, including those with elevated privacy considerations and risks. As such, there is a rising interest in generating high-quality synthetic proxies for real tabular data as a means of reducing privacy risk and proprietary data exposure. With tabular diffusion models (TDMs) demonstrating leading performance in synthesizing such data, understanding and measuring the privacy risks associated with these models is imperative. Leveraging state-of-the-art membership inference attacks for TDMs in both black- and white-box settings, this work quantifies the impact of training setup, synthesis choices, and attacker knowledge on privacy leakage. Moreover, the results demonstrate that adversaries need not have perfect knowledge of the training setup, identical data distributions, or massive compute resources to construct successful attacks. Finally, the pitfalls associated with applying heuristic privacy metrics, such as distance-to-closest record, are revealed.

Problem

Research questions and friction points this paper is trying to address.

privacy leakage

tabular diffusion models

membership inference attacks

synthetic data

privacy metrics

Innovation

Methods, ideas, or system contributions that make the work stand out.

tabular diffusion models

privacy leakage

membership inference attacks