TAB-DRW: A DFT-based Robust Watermark for Generative Tabular Data

📅 2025-11-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing watermarking methods for provenance tracking of generative AI-synthesized tabular data suffer from high computational overhead, incompatibility with mixed discrete-continuous data, and weak resilience against post-generation edits. To address these challenges, this paper proposes an efficient and robust frequency-domain watermarking scheme. Our approach innovatively integrates discrete Fourier transform (DFT) with adaptive imaginary-part modulation and introduces a sorting-based pseudo-random bit generation mechanism, enabling zero-storage-overhead watermark embedding. Furthermore, we incorporate the Yeo-Johnson transformation and standardization to ensure compatibility with heterogeneous data types. Extensive evaluation across five benchmark datasets demonstrates that the embedded watermark achieves high detection accuracy, strong robustness against common post-processing attacks—including additive noise, row shuffling, and column pruning—and preserves high fidelity of the original tabular data.

Technology Category

Application Category

📝 Abstract
The rise of generative AI has enabled the production of high-fidelity synthetic tabular data across fields such as healthcare, finance, and public policy, raising growing concerns about data provenance and misuse. Watermarking offers a promising solution to address these concerns by ensuring the traceability of synthetic data, but existing methods face many limitations: they are computationally expensive due to reliance on large diffusion models, struggle with mixed discrete-continuous data, or lack robustness to post-modifications. To address them, we propose TAB-DRW, an efficient and robust post-editing watermarking scheme for generative tabular data. TAB-DRW embeds watermark signals in the frequency domain: it normalizes heterogeneous features via the Yeo-Johnson transformation and standardization, applies the discrete Fourier transform (DFT), and adjusts the imaginary parts of adaptively selected entries according to precomputed pseudorandom bits. To further enhance robustness and efficiency, we introduce a novel rank-based pseudorandom bit generation method that enables row-wise retrieval without incurring storage overhead. Experiments on five benchmark tabular datasets show that TAB-DRW achieves strong detectability and robustness against common post-processing attacks, while preserving high data fidelity and fully supporting mixed-type features.
Problem

Research questions and friction points this paper is trying to address.

Develops a watermarking method for synthetic tabular data
Ensures traceability and robustness against post-processing attacks
Handles mixed discrete-continuous data efficiently without large models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses DFT frequency domain watermark embedding
Applies Yeo-Johnson transformation for feature normalization
Introduces rank-based pseudorandom bit generation method
🔎 Similar Papers