🤖 AI Summary
To address the lack of training-data provenance in diffusion model fine-tuning and the absence of standardized evaluation criteria for watermarking techniques, this paper introduces the first comprehensive evaluation framework for fine-tuning traceability. We propose a unified threat model to systematically assess existing watermarking methods across three critical dimensions: generality, transferability, and robustness—particularly under realistic adversarial attacks. Furthermore, we design a black-box watermark removal algorithm that operates without access to the original training data, enabling complete watermark erasure while preserving fine-tuned model performance. Experimental results reveal that current watermarking methods exhibit limited robustness under conventional benchmarks but are consistently vulnerable under practical threat scenarios. This work establishes a reproducible benchmark, advocates a more realistic evaluation paradigm, and delivers critical security insights—thereby advancing the development of trustworthy generative models.
📝 Abstract
Recent fine-tuning techniques for diffusion models enable them to reproduce specific image sets, such as particular faces or artistic styles, but also introduce copyright and security risks. Dataset watermarking has been proposed to ensure traceability by embedding imperceptible watermarks into training images, which remain detectable in outputs even after fine-tuning. However, current methods lack a unified evaluation framework. To address this, this paper establishes a general threat model and introduces a comprehensive evaluation framework encompassing Universality, Transmissibility, and Robustness. Experiments show that existing methods perform well in universality and transmissibility, and exhibit some robustness against common image processing operations, yet still fall short under real-world threat scenarios. To reveal these vulnerabilities, the paper further proposes a practical watermark removal method that fully eliminates dataset watermarks without affecting fine-tuning, highlighting a key challenge for future research.