Mining Forgery Traces from Reconstruction Error: A Weakly Supervised Framework for Multimodal Deepfake Temporal Localization

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This work addresses the challenge of temporally localizing deepfake manipulations in videos under weak supervision, where only video-level labels are available and forgeries exhibit localized and intermittent characteristics. The authors propose RT-DeepLoc, a novel framework that leverages reconstruction error as a weakly supervised signal for temporal forgery localization. Specifically, a Masked Autoencoder trained exclusively on authentic videos is employed to detect anomalies through elevated reconstruction errors induced by manipulated segments. To further enhance discriminability without frame-level annotations, the method introduces an asymmetric intra-video contrastive loss that promotes compactness of genuine features, thereby establishing a robust decision boundary. Experiments demonstrate that RT-DeepLoc achieves state-of-the-art performance in weakly supervised temporal deepfake localization on large-scale benchmarks such as LAV-DF and exhibits strong generalization to unseen forgery types.

Technology Category

Application Category

📝 Abstract

Modern deepfakes have evolved into localized and intermittent manipulations that require fine-grained temporal localization. The prohibitive cost of frame-level annotation makes weakly supervised methods a practical necessity, which rely only on video-level labels. To this end, we propose Reconstruction-based Temporal Deepfake Localization (RT-DeepLoc), a weakly supervised temporal forgery localization framework that identifies forgeries via reconstruction errors. Our framework uses a Masked Autoencoder (MAE) trained exclusively on authentic data to learn its intrinsic spatiotemporal patterns; this allows the model to produce significant reconstruction discrepancies for forged segments, effectively providing the missing fine-grained cues for localization. To robustly leverage these indicators, we introduce a novel Asymmetric Intra-video Contrastive Loss (AICL). By focusing on the compactness of authentic features guided by these reconstruction cues, AICL establishes a stable decision boundary that enhances local discrimination while preserving generalization to unseen forgeries. Extensive experiments on large-scale datasets, including LAV-DF, demonstrate that RT-DeepLoc achieves state-of-the-art performance in weakly-supervised temporal forgery localization.

Problem

Research questions and friction points this paper is trying to address.

deepfake

temporal localization

weakly supervised

forgery detection

reconstruction error

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reconstruction Error

Weakly Supervised Learning

Temporal Deepfake Localization