Evaluating Dataset Watermarking for Fine-tuning Traceability of Customized Diffusion Models: A Comprehensive Benchmark and Removal Approach

📅 2025-11-24

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

To address the lack of training-data provenance in diffusion model fine-tuning and the absence of standardized evaluation criteria for watermarking techniques, this paper introduces the first comprehensive evaluation framework for fine-tuning traceability. We propose a unified threat model to systematically assess existing watermarking methods across three critical dimensions: generality, transferability, and robustness—particularly under realistic adversarial attacks. Furthermore, we design a black-box watermark removal algorithm that operates without access to the original training data, enabling complete watermark erasure while preserving fine-tuned model performance. Experimental results reveal that current watermarking methods exhibit limited robustness under conventional benchmarks but are consistently vulnerable under practical threat scenarios. This work establishes a reproducible benchmark, advocates a more realistic evaluation paradigm, and delivers critical security insights—thereby advancing the development of trustworthy generative models.

Technology Category

Application Category

📝 Abstract

Recent fine-tuning techniques for diffusion models enable them to reproduce specific image sets, such as particular faces or artistic styles, but also introduce copyright and security risks. Dataset watermarking has been proposed to ensure traceability by embedding imperceptible watermarks into training images, which remain detectable in outputs even after fine-tuning. However, current methods lack a unified evaluation framework. To address this, this paper establishes a general threat model and introduces a comprehensive evaluation framework encompassing Universality, Transmissibility, and Robustness. Experiments show that existing methods perform well in universality and transmissibility, and exhibit some robustness against common image processing operations, yet still fall short under real-world threat scenarios. To reveal these vulnerabilities, the paper further proposes a practical watermark removal method that fully eliminates dataset watermarks without affecting fine-tuning, highlighting a key challenge for future research.

Problem

Research questions and friction points this paper is trying to address.

Evaluating dataset watermarking effectiveness for tracing customized diffusion models

Establishing unified benchmark framework for watermark universality and robustness

Developing removal method exposing vulnerabilities in current watermarking techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Establishes comprehensive evaluation framework for dataset watermarking

Proposes practical watermark removal method without affecting fine-tuning

Benchmarks watermarking methods across universality transmissibility robustness

🔎 Similar Papers

EnTruth: Enhancing the Traceability of Unauthorized Dataset Usage in Text-to-image Diffusion Models with Minimal and Robust Alterations