🤖 AI Summary
This work proposes a Dimension-aware Mapping framework (DiM) to unify diverse deep watermarking approaches that, despite structural similarities, exhibit varied functionalities and lack a coherent theoretical foundation. DiM models watermarks as multidimensional payloads—encompassing 1D messages, 2D masks, and 3D spatiotemporal structures—and introduces a configurable dimension-mapping mechanism to standardize both embedding and extraction processes. For the first time, the framework interprets and integrates multiple watermarking behaviors through the lens of dimension configuration, enabling both intra-dimensional structure preservation and cross-dimensional localization. In the video domain, merely adjusting the dimension configuration allows the system to achieve spatiotemporal tampering localization, localized embedding control, and robust recovery under temporal disruption, substantially enhancing the flexibility and functionality of watermarking systems.
📝 Abstract
Deep watermarking methods often share similar encoder-decoder architectures, yet differ substantially in their functional behaviors. We propose DiM, a new multi-dimensional watermarking framework that formulates watermarking as a dimension-aware mapping problem, thereby unifying existing watermarking methods at the functional level. Under DiM, watermark information is modeled as payloads of different dimensionalities, including one-dimensional binary messages, two-dimensional spatial masks, and three-dimensional spatiotemporal structures. We find that the dimensional configuration of embedding and extraction largely determines the resulting watermarking behavior. Same-dimensional mappings preserve payload structure and support fine-grained control, while cross-dimensional mappings enable spatial or spatiotemporal localization. We instantiate DiM in the video domain, where spatiotemporal representations enable a broader set of dimension mappings. Experiments demonstrate that varying only the embedding and extraction dimensions, without architectural changes, leads to different watermarking capabilities, including spatiotemporal tamper localization, local embedding control, and recovery of temporal order under frame disruptions.