π€ AI Summary
This work addresses the challenge of content misuse involving AI-generated images paired with harmful text, which often evades conventional moderation due to the absence of traceable metadata. The authors propose an end-to-end forensic framework that embeds a cryptographically signed watermark during image generation and integrates multimodal harmful content detection to trigger provenance verification. Innovatively co-designing steganography with multimodal harm detection, the approach establishes a cross-modal, triggerable accountability mechanism for AI-generated content. Experimental results demonstrate that the adopted spread-spectrum watermark in the wavelet domain exhibits strong robustness against blurring perturbations, while the CLIP-based multimodal detector achieves an AUC-ROC of 0.99, significantly enhancing the reliability of content attribution.
π Abstract
The rapid growth of generative AI has introduced new challenges in content moderation and digital forensics. In particular, benign AI-generated images can be paired with harmful or misleading text, creating difficult-to-detect misuse. This contextual misuse undermines the traditional moderation framework and complicates attribution, as synthetic images typically lack persistent metadata or device signatures. We introduce a steganography enabled attribution framework that embeds cryptographically signed identifiers into images at creation time and uses multimodal harmful content detection as a trigger for attribution verification. Our system evaluates five watermarking methods across spatial, frequency, and wavelet domains. It also integrates a CLIP-based fusion model for multimodal harmful-content detection. Experiments demonstrate that spread-spectrum watermarking, especially in the wavelet domain, provides strong robustness under blur distortions, and our multimodal fusion detector achieves an AUC-ROC of 0.99, enabling reliable cross-modal attribution verification. These components form an end-to-end forensic pipeline that enables reliable tracing of harmful deployments of AI-generated imagery, supporting accountability in modern synthetic media environments. Our code is available at GitHub: https://github.com/bli1/steganography