Dual Guidance Semi-Supervised Action Detection

📅 2025-07-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address performance degradation in spatiotemporal action localization caused by scarce annotated data, this paper proposes a dual-guided semi-supervised learning framework. The method jointly optimizes frame-level action classification and bounding-box regression, enforcing cross-frame temporal consistency and cross-box spatial consistency constraints to significantly improve pseudo-label quality. It further introduces multi-task joint training and a dynamic-threshold pseudo-label generation mechanism to enhance model utilization of unlabeled video data. Experiments on UCF101-24, J-HMDB-21, and AVA demonstrate that our approach substantially outperforms existing image-level semi-supervised methods. Notably, it maintains strong performance under low-label regimes—using only 10%–30% of annotated data—thereby establishing a novel paradigm for video-level weakly supervised learning.

Technology Category

Application Category

📝 Abstract
Semi-Supervised Learning (SSL) has shown tremendous potential to improve the predictive performance of deep learning models when annotations are hard to obtain. However, the application of SSL has so far been mainly studied in the context of image classification. In this work, we present a semi-supervised approach for spatial-temporal action localization. We introduce a dual guidance network to select better pseudo-bounding boxes. It combines a frame-level classification with a bounding-box prediction to enforce action class consistency across frames and boxes. Our evaluation across well-known spatial-temporal action localization datasets, namely UCF101-24 , J-HMDB-21 and AVA shows that the proposed module considerably enhances the model's performance in limited labeled data settings. Our framework achieves superior results compared to extended image-based semi-supervised baselines.
Problem

Research questions and friction points this paper is trying to address.

Develops semi-supervised spatial-temporal action localization method
Introduces dual guidance for better pseudo-bounding box selection
Enhances performance in limited labeled data settings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual guidance network for pseudo-bounding box selection
Combines frame-level classification and bounding-box prediction
Enhances performance in limited labeled data settings
🔎 Similar Papers
No similar papers found.