Mining Forgery Traces from Reconstruction Error: A Weakly Supervised Framework for Multimodal Deepfake Temporal Localization

📅 2026-01-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of temporally localizing deepfake manipulations in videos under weak supervision, where only video-level labels are available and forgeries exhibit localized and intermittent characteristics. The authors propose RT-DeepLoc, a novel framework that leverages reconstruction error as a weakly supervised signal for temporal forgery localization. Specifically, a Masked Autoencoder trained exclusively on authentic videos is employed to detect anomalies through elevated reconstruction errors induced by manipulated segments. To further enhance discriminability without frame-level annotations, the method introduces an asymmetric intra-video contrastive loss that promotes compactness of genuine features, thereby establishing a robust decision boundary. Experiments demonstrate that RT-DeepLoc achieves state-of-the-art performance in weakly supervised temporal deepfake localization on large-scale benchmarks such as LAV-DF and exhibits strong generalization to unseen forgery types.

Technology Category

Application Category

📝 Abstract
Modern deepfakes have evolved into localized and intermittent manipulations that require fine-grained temporal localization. The prohibitive cost of frame-level annotation makes weakly supervised methods a practical necessity, which rely only on video-level labels. To this end, we propose Reconstruction-based Temporal Deepfake Localization (RT-DeepLoc), a weakly supervised temporal forgery localization framework that identifies forgeries via reconstruction errors. Our framework uses a Masked Autoencoder (MAE) trained exclusively on authentic data to learn its intrinsic spatiotemporal patterns; this allows the model to produce significant reconstruction discrepancies for forged segments, effectively providing the missing fine-grained cues for localization. To robustly leverage these indicators, we introduce a novel Asymmetric Intra-video Contrastive Loss (AICL). By focusing on the compactness of authentic features guided by these reconstruction cues, AICL establishes a stable decision boundary that enhances local discrimination while preserving generalization to unseen forgeries. Extensive experiments on large-scale datasets, including LAV-DF, demonstrate that RT-DeepLoc achieves state-of-the-art performance in weakly-supervised temporal forgery localization.
Problem

Research questions and friction points this paper is trying to address.

deepfake
temporal localization
weakly supervised
forgery detection
reconstruction error
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reconstruction Error
Weakly Supervised Learning
Temporal Deepfake Localization
Masked Autoencoder
Asymmetric Intra-video Contrastive Loss
🔎 Similar Papers
No similar papers found.
M
Midou Guo
School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
Q
Qilin Yin
Alibaba group, Hangzhou, China
Wei Lu
Wei Lu
Sun Yat-sen University
computer science
Xiangyang Luo
Xiangyang Luo
Zhengzhou Information Science and Technology Institute
information hidingdata hiding steganography
R
Rui Yang
Alibaba group, Hangzhou, China