DeepForgeSeal: Latent Space-Driven Semi-Fragile Watermarking for Deepfake Detection Using Multi-Agent Adversarial Reinforcement Learning

📅 2025-11-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the poor generalizability of passive deepfake detection methods and the fundamental trade-off in existing watermarking schemes—namely, the difficulty in simultaneously achieving robustness against benign distortions and sensitivity to malicious manipulations—this paper proposes a latent-space-based semi-fragile watermarking framework. Methodologically, it innovatively integrates latent encoding from deep generative models, differentiable watermark embedding/extraction networks, and multi-agent adversarial reinforcement learning (MARL), where collaborative agents dynamically simulate diverse image transformations to optimize the watermark’s equilibrium between robustness and fragility. Its key contribution is the first application of MARL to semi-fragile watermark design, enabling active, adaptive synthetic media identification. Evaluated on CelebA and CelebA-HQ, the framework achieves detection accuracy improvements of 4.5% and 5.3%, respectively, significantly outperforming state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract
Rapid advances in generative AI have led to increasingly realistic deepfakes, posing growing challenges for law enforcement and public trust. Existing passive deepfake detectors struggle to keep pace, largely due to their dependence on specific forgery artifacts, which limits their ability to generalize to new deepfake types. Proactive deepfake detection using watermarks has emerged to address the challenge of identifying high-quality synthetic media. However, these methods often struggle to balance robustness against benign distortions with sensitivity to malicious tampering. This paper introduces a novel deep learning framework that harnesses high-dimensional latent space representations and the Multi-Agent Adversarial Reinforcement Learning (MAARL) paradigm to develop a robust and adaptive watermarking approach. Specifically, we develop a learnable watermark embedder that operates in the latent space, capturing high-level image semantics, while offering precise control over message encoding and extraction. The MAARL paradigm empowers the learnable watermarking agent to pursue an optimal balance between robustness and fragility by interacting with a dynamic curriculum of benign and malicious image manipulations simulated by an adversarial attacker agent. Comprehensive evaluations on the CelebA and CelebA-HQ benchmarks reveal that our method consistently outperforms state-of-the-art approaches, achieving improvements of over 4.5% on CelebA and more than 5.3% on CelebA-HQ under challenging manipulation scenarios.
Problem

Research questions and friction points this paper is trying to address.

Balancing robustness and fragility in deepfake watermarking systems
Generalizing detection capabilities across evolving deepfake techniques
Developing adaptive watermarking resistant to manipulation while sensitive to tampering
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses latent space representations for watermark embedding
Employs multi-agent adversarial reinforcement learning paradigm
Balances robustness and fragility via dynamic curriculum learning
🔎 Similar Papers
No similar papers found.
Tharindu Fernando
Tharindu Fernando
Queensland University of Technology
human behaviour analysistrajectory predictionmachine learning
C
C. Fookes
The Signal Processing, Artificial Intelligence and Vision Technologies (SAIVT), Queensland University of Technology, Australia
S
S. Sridharan
The Signal Processing, Artificial Intelligence and Vision Technologies (SAIVT), Queensland University of Technology, Australia