Video Signature: In-generation Watermarking for Latent Video Diffusion Models

📅 2025-05-31

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

To address the absence of generative video watermarking and the quality degradation caused by post-hoc embedding, this paper proposes, for the first time, an implicit, adaptive watermarking paradigm embedded within the latent-space diffusion generation process. Methodologically: (1) we design a Perturbation-Aware Suppression (PAS) mechanism with perception-sensitive layer freezing to balance watermark robustness and visual fidelity; (2) we introduce a lightweight temporal alignment module to ensure inter-frame consistency; and (3) we jointly optimize the implicit watermark encoder-decoder and diffusion model fine-tuning. Experiments demonstrate that our method outperforms existing approaches across extraction accuracy, PSNR/SSIM, and inference speed. It achieves over 92% robust watermark recovery under spatiotemporal attacks—including cropping and frame dropping—significantly enhancing practical utility for intellectual property protection and content traceability.

Technology Category

Application Category

📝 Abstract

The rapid development of Artificial Intelligence Generated Content (AIGC) has led to significant progress in video generation but also raises serious concerns about intellectual property protection and reliable content tracing. Watermarking is a widely adopted solution to this issue, but existing methods for video generation mainly follow a post-generation paradigm, which introduces additional computational overhead and often fails to effectively balance the trade-off between video quality and watermark extraction. To address these issues, we propose Video Signature (VIDSIG), an in-generation watermarking method for latent video diffusion models, which enables implicit and adaptive watermark integration during generation. Specifically, we achieve this by partially fine-tuning the latent decoder, where Perturbation-Aware Suppression (PAS) pre-identifies and freezes perceptually sensitive layers to preserve visual quality. Beyond spatial fidelity, we further enhance temporal consistency by introducing a lightweight Temporal Alignment module that guides the decoder to generate coherent frame sequences during fine-tuning. Experimental results show that VIDSIG achieves the best overall performance in watermark extraction, visual quality, and generation efficiency. It also demonstrates strong robustness against both spatial and temporal tampering, highlighting its practicality in real-world scenarios.

Problem

Research questions and friction points this paper is trying to address.

Protecting intellectual property in AI-generated video content

Balancing video quality and watermark extraction efficiency

Ensuring temporal consistency in watermarked video generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

In-generation watermarking for latent diffusion models

Perturbation-Aware Suppression preserves visual quality

Temporal Alignment module enhances frame consistency

🔎 Similar Papers

LaWa: Using Latent Space for In-Generation Image Watermarking