🤖 AI Summary
To address the copyright attribution challenge for AI-generated images in large-scale internet environments, this paper proposes SynthID-O, an industrial-grade deep learning invisible watermarking system. Methodologically, it introduces an end-to-end trainable joint embedding-detection architecture, integrating fine-grained threat modeling and cross-modal scalability to jointly optimize fidelity, robustness, and security. Key contributions include: (1) real-time watermark embedding and verification at up to 100 million samples per second; (2) >99% detection accuracy under over 30 common distortions—including cropping, compression, and resampling; and (3) large-scale deployment and validation across >100 billion image/video frames. Experiments demonstrate that SynthID-O achieves state-of-the-art performance in both visual quality (PSNR > 45 dB) and adversarial robustness, while its architecture generalizes to multimodal domains such as audio.
📝 Abstract
We introduce SynthID-Image, a deep learning-based system for invisibly watermarking AI-generated imagery. This paper documents the technical desiderata, threat models, and practical challenges of deploying such a system at internet scale, addressing key requirements of effectiveness, fidelity, robustness, and security. SynthID-Image has been used to watermark over ten billion images and video frames across Google's services and its corresponding verification service is available to trusted testers. For completeness, we present an experimental evaluation of an external model variant, SynthID-O, which is available through partnerships. We benchmark SynthID-O against other post-hoc watermarking methods from the literature, demonstrating state-of-the-art performance in both visual quality and robustness to common image perturbations. While this work centers on visual media, the conclusions on deployment, constraints, and threat modeling generalize to other modalities, including audio. This paper provides a comprehensive documentation for the large-scale deployment of deep learning-based media provenance systems.