SIGMark: Scalable In-Generation Watermark with Blind Extraction for Video Diffusion

πŸ“… 2026-03-03
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the limitations of existing watermarking methods for video diffusion models, which rely on non-blind extraction, require storing large key sets, involve costly template matching, and exhibit insufficient robustness to temporal perturbations under causal 3D VAE architectures. To overcome these challenges, we propose SIGMarkβ€”the first blind watermarking framework tailored for video diffusion models. SIGMark embeds watermarks by generating initial noise through global frame-level pseudo-random coding and introduces a Segmented Group Ordering (SGO) module specifically designed to align with causal 3D VAEs, thereby significantly enhancing robustness against spatiotemporal distortions. Experimental results demonstrate that SIGMark achieves high bit-wise extraction accuracy under diverse perturbations while substantially reducing storage and computational overhead, offering both scalability and practical applicability.

Technology Category

Application Category

πŸ“ Abstract
Artificial Intelligence Generated Content (AIGC), particularly video generation with diffusion models, has been advanced rapidly. Invisible watermarking is a key technology for protecting AI-generated videos and tracing harmful content, and thus plays a crucial role in AI safety. Beyond post-processing watermarks which inevitably degrade video quality, recent studies have proposed distortion-free in-generation watermarking for video diffusion models. However, existing in-generation approaches are non-blind: they require maintaining all the message-key pairs and performing template-based matching during extraction, which incurs prohibitive computational costs at scale. Moreover, when applied to modern video diffusion models with causal 3D Variational Autoencoders (VAEs), their robustness against temporal disturbance becomes extremely weak. To overcome these challenges, we propose SIGMark, a Scalable In-Generation watermarking framework with blind extraction for video diffusion. To achieve blind-extraction, we propose to generate watermarked initial noise using a Global set of Frame-wise PseudoRandom Coding keys (GF-PRC), reducing the cost of storing large-scale information while preserving noise distribution and diversity for distortion-free watermarking. To enhance robustness, we further design a Segment Group-Ordering module (SGO) tailored to causal 3D VAEs, ensuring robust watermark inversion during extraction under temporal disturbance. Comprehensive experiments on modern diffusion models show that SIGMark achieves very high bit-accuracy during extraction under both temporal and spatial disturbances with minimal overhead, demonstrating its scalability and robustness. Our project is available at https://jeremyzhao1998.github.io/SIGMark-release/.
Problem

Research questions and friction points this paper is trying to address.

in-generation watermarking
blind extraction
video diffusion models
temporal robustness
scalability
Innovation

Methods, ideas, or system contributions that make the work stand out.

blind extraction
in-generation watermarking
video diffusion models
causal 3D VAEs
scalable watermarking
πŸ”Ž Similar Papers
No similar papers found.
X
Xinjie Zhu
Lenovo Research, Beijing, China
Zijing Zhao
Zijing Zhao
Lenovo Research
H
Hui Jin
Lenovo Research, Beijing, China
Q
Qingxiao Guo
Lenovo Research, Beijing, China
Y
Yilong Ma
Lenovo Research, Beijing, China
Y
Yunhao Wang
Lenovo Research, Beijing, China
X
Xiaobing Guo
Lenovo Research, Beijing, China
Weifeng Zhang
Weifeng Zhang
Corp VP & Head of Intelligent Computing Lab at Lenovo Research
AI HW SW Co-DesignComputer ArchitectureHeterogeneous ComputingAI/MLGPU Optimizations