GEM-TFL: Bridging Weak and Full Supervision for Forgery Localization through EM-Guided Decomposition and Temporal Refinement

📅 2026-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Weakly supervised temporal forgery localization faces several challenges, including misalignment between training and inference objectives, insufficient supervision from binary labels, non-differentiable top-k aggregation, and the lack of explicit modeling of relationships among candidate segments. To address these issues, this work proposes GEM-TFL, a two-stage classification-regression framework. It leverages an EM algorithm to decompose weak labels into multi-dimensional latent attributes, thereby enriching the supervisory signal. A training-free temporal smoothing mechanism is introduced to enhance temporal consistency, and a graph neural network is employed to explicitly model the temporal-semantic relationships among candidate segments for global confidence estimation. This approach effectively mitigates gradient blocking, significantly improves localization accuracy, substantially narrows the performance gap with fully supervised methods across multiple benchmarks, and demonstrates superior robustness and precision.

Technology Category

Application Category

📝 Abstract
Temporal Forgery Localization (TFL) aims to precisely identify manipulated segments within videos or audio streams, providing interpretable evidence for multimedia forensics and security. While most existing TFL methods rely on dense frame-level labels in a fully supervised manner, Weakly Supervised TFL (WS-TFL) reduces labeling cost by learning only from binary video-level labels. However, current WS-TFL approaches suffer from mismatched training and inference objectives, limited supervision from binary labels, gradient blockage caused by non-differentiable top-k aggregation, and the absence of explicit modeling of inter-proposal relationships. To address these issues, we propose GEM-TFL (Graph-based EM-powered Temporal Forgery Localization), a two-phase classification-regression framework that effectively bridges the supervision gap between training and inference. Built upon this foundation, (1) we enhance weak supervision by reformulating binary labels into multi-dimensional latent attributes through an EM-based optimization process; (2) we introduce a training-free temporal consistency refinement that realigns frame-level predictions for smoother temporal dynamics; and (3) we design a graph-based proposal refinement module that models temporal-semantic relationships among proposals for globally consistent confidence estimation. Extensive experiments on benchmark datasets demonstrate that GEM-TFL achieves more accurate and robust temporal forgery localization, substantially narrowing the gap with fully supervised methods.
Problem

Research questions and friction points this paper is trying to address.

Temporal Forgery Localization
Weak Supervision
Label Mismatch
Gradient Blockage
Inter-proposal Relationships
Innovation

Methods, ideas, or system contributions that make the work stand out.

Weakly Supervised Learning
Temporal Forgery Localization
EM Algorithm
Graph-based Refinement
Temporal Consistency
🔎 Similar Papers
No similar papers found.