🤖 AI Summary
Existing tabular diffusion models suffer from poor invertibility, hindering direct adaptation of DDIM-based watermarking. Method: We propose a model-agnostic, multi-sample selection watermarking framework that bypasses model invertibility. It generates multiple candidate samples in parallel, employs a customized scoring function to select the optimal sample for watermark embedding, and establishes a theoretical relationship among watermark detectability, candidate count, and dataset size to enable precise strength control. Contributions/Results: (1) The first invertibility-free, DDIM-independent watermarking paradigm for tabular generation; (2) A theory-guided parameter calibration mechanism; (3) State-of-the-art detection performance (1.0 TPR at 0.1% FPR), high fidelity (81–89% lower distortion), strong robustness against adversarial attacks, and full compatibility with any tabular generative model supporting repeated sampling.
📝 Abstract
We introduce MUSE, a watermarking algorithm for tabular generative models. Previous approaches typically leverage DDIM invertibility to watermark tabular diffusion models, but tabular diffusion models exhibit significantly poorer invertibility compared to other modalities, compromising performance. Simultaneously, tabular diffusion models require substantially less computation than other modalities, enabling a multi-sample selection approach to tabular generative model watermarking. MUSE embeds watermarks by generating multiple candidate samples and selecting one based on a specialized scoring function, without relying on model invertibility. Our theoretical analysis establishes the relationship between watermark detectability, candidate count, and dataset size, allowing precise calibration of watermarking strength. Extensive experiments demonstrate that MUSE achieves state-of-the-art watermark detectability and robustness against various attacks while maintaining data quality, and remains compatible with any tabular generative model supporting repeated sampling, effectively addressing key challenges in tabular data watermarking. Specifically, it reduces the distortion rates on fidelity metrics by 81-89%, while achieving a 1.0 TPR@0.1%FPR detection rate. Implementation of MUSE can be found at https://github.com/fangliancheng/MUSE.