Toward Accountable AI-Generated Content on Social Platforms: Steganographic Attribution and Multimodal Harm Detection

📅 2026-04-12

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of content misuse involving AI-generated images paired with harmful text, which often evades conventional moderation due to the absence of traceable metadata. The authors propose an end-to-end forensic framework that embeds a cryptographically signed watermark during image generation and integrates multimodal harmful content detection to trigger provenance verification. Innovatively co-designing steganography with multimodal harm detection, the approach establishes a cross-modal, triggerable accountability mechanism for AI-generated content. Experimental results demonstrate that the adopted spread-spectrum watermark in the wavelet domain exhibits strong robustness against blurring perturbations, while the CLIP-based multimodal detector achieves an AUC-ROC of 0.99, significantly enhancing the reliability of content attribution.

Technology Category

Application Category

📝 Abstract

The rapid growth of generative AI has introduced new challenges in content moderation and digital forensics. In particular, benign AI-generated images can be paired with harmful or misleading text, creating difficult-to-detect misuse. This contextual misuse undermines the traditional moderation framework and complicates attribution, as synthetic images typically lack persistent metadata or device signatures. We introduce a steganography enabled attribution framework that embeds cryptographically signed identifiers into images at creation time and uses multimodal harmful content detection as a trigger for attribution verification. Our system evaluates five watermarking methods across spatial, frequency, and wavelet domains. It also integrates a CLIP-based fusion model for multimodal harmful-content detection. Experiments demonstrate that spread-spectrum watermarking, especially in the wavelet domain, provides strong robustness under blur distortions, and our multimodal fusion detector achieves an AUC-ROC of 0.99, enabling reliable cross-modal attribution verification. These components form an end-to-end forensic pipeline that enables reliable tracing of harmful deployments of AI-generated imagery, supporting accountability in modern synthetic media environments. Our code is available at GitHub: https://github.com/bli1/steganography

Problem

Research questions and friction points this paper is trying to address.

AI-generated content

content moderation

attribution

multimodal harm detection

synthetic media

Innovation

Methods, ideas, or system contributions that make the work stand out.

steganographic attribution

multimodal harm detection

AI-generated content

digital watermarking

CLIP-based fusion

🔎 Similar Papers

No similar papers found.

Authors to Follow