π€ AI Summary
Existing deep image watermarking methods optimize the embedder and extractor independently, coupling them only weakly via the final lossβlacking decoding-aware guidance and collaborative learning. This paper proposes a bidirectional mutual-teacher framework that models the embedder and extractor as interactive, mutually supervising modules for end-to-end joint optimization. Key contributions include: (1) a Collaborative Interaction Mechanism (CIM) enabling bidirectional feature-level information exchange; (2) an Adaptive Feature Modulation Module (AFMM) for content-aware robust representation learning; and (3) a decoupled feature regulation strategy that explicitly separates content and watermark representations. Experiments demonstrate substantial improvements in watermark extraction accuracy on both natural and AI-generated images, while maintaining high visual fidelity, strong robustness against common distortions, and superior cross-domain generalization.
π Abstract
Existing deep image watermarking methods follow a fixed embedding-distortion-extraction pipeline, where the embedder and extractor are weakly coupled through a final loss and optimized in isolation. This design lacks explicit collaboration, leaving no structured mechanism for the embedder to incorporate decoding-aware cues or for the extractor to guide embedding during training. To address this architectural limitation, we rethink deep image watermarking by reformulating embedding and extraction as explicitly collaborative components. To realize this reformulation, we introduce a Collaborative Interaction Mechanism (CIM) that establishes direct, bidirectional communication between the embedder and extractor, enabling a mutual-teacher training paradigm and coordinated optimization. Built upon this explicitly collaborative architecture, we further propose an Adaptive Feature Modulation Module (AFMM) to support effective interaction. AFMM enables content-aware feature regulation by decoupling modulation structure and strength, guiding watermark embedding toward stable image features while suppressing host interference during extraction. Under CIM, the AFMMs on both sides form a closed-loop collaboration that aligns embedding behavior with extraction objectives. This architecture-level redesign changes how robustness is learned in watermarking systems. Rather than relying on exhaustive distortion simulation, robustness emerges from coordinated representation learning between embedding and extraction. Experiments on real-world and AI-generated datasets demonstrate that the proposed method consistently outperforms state-of-the-art approaches in watermark extraction accuracy while maintaining high perceptual quality, showing strong robustness and generalization.