Bridge the Gap: From Weak to Full Supervision for Temporal Action Localization with PseudoFormer

๐Ÿ“… 2025-04-21
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Addressing three key challenges in weakly supervised temporal action localization (WTAL)โ€”low-quality pseudo-labels, insufficient exploitation of multi-granularity priors, and difficulty in training with noisy labelsโ€”this paper proposes PseudoFormer, a dual-branch framework. Our method introduces: (1) RickerFusion, a novel global feature mapping mechanism that enables high-fidelity cross-scale pseudo-label generation; (2) joint modeling of snippet-level and proposal-level weak supervision, establishing dual-path supervision for complementary granularity-aware learning; and (3) an uncertainty-aware masking strategy coupled with iterative pseudo-label refinement and an uncertainty-weighted loss to enhance robustness against label noise. Evaluated on THUMOS14 and ActivityNet 1.3, PseudoFormer achieves state-of-the-art performance, significantly narrowing the gap between weakly and fully supervised approaches. Ablation studies comprehensively validate the effectiveness of each component.

Technology Category

Application Category

๐Ÿ“ Abstract
Weakly-supervised Temporal Action Localization (WTAL) has achieved notable success but still suffers from a lack of temporal annotations, leading to a performance and framework gap compared with fully-supervised methods. While recent approaches employ pseudo labels for training, three key challenges: generating high-quality pseudo labels, making full use of different priors, and optimizing training methods with noisy labels remain unresolved. Due to these perspectives, we propose PseudoFormer, a novel two-branch framework that bridges the gap between weakly and fully-supervised Temporal Action Localization (TAL). We first introduce RickerFusion, which maps all predicted action proposals to a global shared space to generate pseudo labels with better quality. Subsequently, we leverage both snippet-level and proposal-level labels with different priors from the weak branch to train the regression-based model in the full branch. Finally, the uncertainty mask and iterative refinement mechanism are applied for training with noisy pseudo labels. PseudoFormer achieves state-of-the-art WTAL results on the two commonly used benchmarks, THUMOS14 and ActivityNet1.3. Besides, extensive ablation studies demonstrate the contribution of each component of our method.
Problem

Research questions and friction points this paper is trying to address.

Improving weakly-supervised temporal action localization quality
Utilizing snippet and proposal-level labels effectively
Optimizing training with noisy pseudo labels
Innovation

Methods, ideas, or system contributions that make the work stand out.

RickerFusion maps proposals for better pseudo labels
Leverages snippet and proposal-level labels with priors
Uses uncertainty mask and iterative refinement
๐Ÿ”Ž Similar Papers
No similar papers found.
Z
Ziyi Liu
University of Science and Technology Beijing
Yangcen Liu
Yangcen Liu
Georgia Institute of Technology
roboticscomputer vision