I2VWM: Robust Watermarking for Image to Video Generation

📅 2025-09-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of source image traceability and insufficient robustness of existing unimodal watermarking methods in image-to-video (I2V) generation, this paper proposes I2VWM—the first cross-modal digital watermarking framework tailored for I2V models. Methodologically, we introduce the “diffusion distance” metric to quantify watermark persistence across temporal diffusion steps; design a video-simulated noise layer to enhance robustness during training; and incorporate an optical flow alignment module to enforce inter-frame consistency during inference. The entire framework is built upon an end-to-end differentiable diffusion model for joint watermark embedding and extraction. Experiments demonstrate that I2VWM significantly improves watermark resilience against generative distortions on both open-source and commercial I2V models, while preserving visual imperceptibility, strong generalization across architectures, and practical deployability.

Technology Category

Application Category

📝 Abstract
The rapid progress of image-guided video generation (I2V) has raised concerns about its potential misuse in misinformation and fraud, underscoring the urgent need for effective digital watermarking. While existing watermarking methods demonstrate robustness within a single modality, they fail to trace source images in I2V settings. To address this gap, we introduce the concept of Robust Diffusion Distance, which measures the temporal persistence of watermark signals in generated videos. Building on this, we propose I2VWM, a cross-modal watermarking framework designed to enhance watermark robustness across time. I2VWM leverages a video-simulation noise layer during training and employs an optical-flow-based alignment module during inference. Experiments on both open-source and commercial I2V models demonstrate that I2VWM significantly improves robustness while maintaining imperceptibility, establishing a new paradigm for cross-modal watermarking in the era of generative video. href{https://github.com/MrCrims/I2VWM-Robust-Watermarking-for-Image-to-Video-Generation}{Code Released.}
Problem

Research questions and friction points this paper is trying to address.

Addresses misuse of image-to-video generation via robust watermarking
Traces source images in generated videos across modalities
Measures temporal watermark persistence in AI-generated content
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Robust Diffusion Distance for temporal watermark persistence
Uses video-simulation noise layer during training for robustness
Employs optical-flow-based alignment module during inference
🔎 Similar Papers
No similar papers found.
G
Guanjie Wang
University of Science and Technology of China
Zehua Ma
Zehua Ma
University of Science and Technology of China
Image WatermarkingImage Processing3D PrintingAesthetic 2D Barcode
H
Han Fang
National University of Singapore
W
Weiming Zhang
University of Science and Technology of China