🤖 AI Summary
This work addresses the challenge of temporally localizing deepfake manipulations in videos under weak supervision, where only video-level labels are available and forgeries exhibit localized and intermittent characteristics. The authors propose RT-DeepLoc, a novel framework that leverages reconstruction error as a weakly supervised signal for temporal forgery localization. Specifically, a Masked Autoencoder trained exclusively on authentic videos is employed to detect anomalies through elevated reconstruction errors induced by manipulated segments. To further enhance discriminability without frame-level annotations, the method introduces an asymmetric intra-video contrastive loss that promotes compactness of genuine features, thereby establishing a robust decision boundary. Experiments demonstrate that RT-DeepLoc achieves state-of-the-art performance in weakly supervised temporal deepfake localization on large-scale benchmarks such as LAV-DF and exhibits strong generalization to unseen forgery types.
📝 Abstract
Modern deepfakes have evolved into localized and intermittent manipulations that require fine-grained temporal localization. The prohibitive cost of frame-level annotation makes weakly supervised methods a practical necessity, which rely only on video-level labels. To this end, we propose Reconstruction-based Temporal Deepfake Localization (RT-DeepLoc), a weakly supervised temporal forgery localization framework that identifies forgeries via reconstruction errors. Our framework uses a Masked Autoencoder (MAE) trained exclusively on authentic data to learn its intrinsic spatiotemporal patterns; this allows the model to produce significant reconstruction discrepancies for forged segments, effectively providing the missing fine-grained cues for localization. To robustly leverage these indicators, we introduce a novel Asymmetric Intra-video Contrastive Loss (AICL). By focusing on the compactness of authentic features guided by these reconstruction cues, AICL establishes a stable decision boundary that enhances local discrimination while preserving generalization to unseen forgeries. Extensive experiments on large-scale datasets, including LAV-DF, demonstrate that RT-DeepLoc achieves state-of-the-art performance in weakly-supervised temporal forgery localization.