LAVA: Layered Audio-Visual Anti-tampering Watermarking for Robust Deepfake Detection and Localization

πŸ“… 2026-04-26
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

217K/year
πŸ€– AI Summary
Existing deepfake detection methods struggle to robustly localize manipulated regions under realistic conditions such as audio-visual decoupling, compression artifacts, and multimodal asynchrony. This work proposes a calibration-aware audio-visual joint watermarking framework that, for the first time, enables collaborative semi-fragile watermark embedding. By integrating hierarchical tamper-resistant watermark design, cross-modal fusion strategies, and a calibration alignment mechanism, the method preserves the consistency of tampering evidence even under degradations like compression and audio-visual desynchronization. Experimental results demonstrate that the proposed approach achieves near-perfect detection performance (AP=0.999) across various challenging scenarios, significantly outperforming current audio-visual fusion baselines.

Technology Category

Application Category

πŸ“ Abstract
Proactive watermarking offers a promising approach for deepfake tamper detection and localization in short-form videos. However, existing methods often decouple audio and visual evidence and assume that watermark signals remain reliable under real-world degradations, making tamper localization vulnerable to multimodal misalignment and compression distortions. Moreover, existing semi-fragile visual watermarking methods often degrade significantly under codec compression because their embedding bands overlap with compression-sensitive frequency regions. To address these limitations, we propose Layered Audio-Visual Anti-tampering Watermarking (LAVA), a calibration-aware audio-visual watermark fusion framework for deepfake tamper detection and localization. LAVA leverages cross-modal watermark fusion and calibration-aware alignment to preserve consistent and reliable tamper evidence under compression and audio-visual asynchrony, enabling robust tamper localization. Extensive experiments demonstrate that LAVA achieves near-perfect detection performance (AP = 0.999), remains robust to compression and multimodal misalignment, and significantly improves tamper localization reliability over existing audio-visual fusion baselines.
Problem

Research questions and friction points this paper is trying to address.

deepfake detection
audio-visual anti-tampering
watermarking
compression robustness
tamper localization
Innovation

Methods, ideas, or system contributions that make the work stand out.

audio-visual watermarking
deepfake detection
tamper localization
calibration-aware alignment
robust watermarking
πŸ”Ž Similar Papers
No similar papers found.