MUSE: Model-Agnostic Tabular Watermarking via Multi-Sample Selection

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Existing tabular diffusion models suffer from poor invertibility, hindering direct adaptation of DDIM-based watermarking. Method: We propose a model-agnostic, multi-sample selection watermarking framework that bypasses model invertibility. It generates multiple candidate samples in parallel, employs a customized scoring function to select the optimal sample for watermark embedding, and establishes a theoretical relationship among watermark detectability, candidate count, and dataset size to enable precise strength control. Contributions/Results: (1) The first invertibility-free, DDIM-independent watermarking paradigm for tabular generation; (2) A theory-guided parameter calibration mechanism; (3) State-of-the-art detection performance (1.0 TPR at 0.1% FPR), high fidelity (81–89% lower distortion), strong robustness against adversarial attacks, and full compatibility with any tabular generative model supporting repeated sampling.

Technology Category

Application Category

📝 Abstract

We introduce MUSE, a watermarking algorithm for tabular generative models. Previous approaches typically leverage DDIM invertibility to watermark tabular diffusion models, but tabular diffusion models exhibit significantly poorer invertibility compared to other modalities, compromising performance. Simultaneously, tabular diffusion models require substantially less computation than other modalities, enabling a multi-sample selection approach to tabular generative model watermarking. MUSE embeds watermarks by generating multiple candidate samples and selecting one based on a specialized scoring function, without relying on model invertibility. Our theoretical analysis establishes the relationship between watermark detectability, candidate count, and dataset size, allowing precise calibration of watermarking strength. Extensive experiments demonstrate that MUSE achieves state-of-the-art watermark detectability and robustness against various attacks while maintaining data quality, and remains compatible with any tabular generative model supporting repeated sampling, effectively addressing key challenges in tabular data watermarking. Specifically, it reduces the distortion rates on fidelity metrics by 81-89%, while achieving a 1.0 TPR@0.1%FPR detection rate. Implementation of MUSE can be found at https://github.com/fangliancheng/MUSE.

Problem

Research questions and friction points this paper is trying to address.

Watermarking tabular generative models without invertibility reliance

Improving watermark detectability and robustness in tabular data

Maintaining data quality while embedding watermarks effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-sample selection for watermarking

No reliance on model invertibility

Specialized scoring function for selection

🔎 Similar Papers

From Intentions to Techniques: A Comprehensive Taxonomy and Challenges in Text Watermarking for Large Language Models