Boundary-Centric Active Learning for Temporal Action Segmentation

📅 2026-04-16

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the high annotation cost and the critical impact of boundary labeling on performance in temporal action segmentation. To this end, the authors propose B-ACT, a novel active learning framework that strategically focuses annotation efforts on action boundary regions. B-ACT employs a two-stage mechanism: it first selects informative videos based on prediction uncertainty, then identifies key boundary frames via a boundary scoring function that integrates neighborhood uncertainty, class ambiguity, and temporal dynamics. Notably, annotating only these boundary frames suffices for training, as the model leverages its receptive field to learn from central segments implicitly. Evaluated on GTEA, 50Salads, and Breakfast datasets, B-ACT substantially outperforms existing active learning approaches and state-of-the-art models, with particularly pronounced gains on boundary-sensitive metrics.

Technology Category

Application Category

📝 Abstract

Temporal action segmentation (TAS) demands dense temporal supervision, yet most of the annotation cost in untrimmed videos is spent identifying and refining action transitions, where segmentation errors concentrate and small temporal shifts disproportionately degrade segmental metrics. We introduce B-ACT, a clip-budgeted active learning framework that explicitly allocates supervision to these high-leverage boundary regions. B-ACT operates in a hierarchical two-stage loop: (i) it ranks and queries unlabeled videos using predictive uncertainty, and (ii) within each selected video, it detects candidate transitions from the current model predictions and selects the top-$K$ boundaries via a novel boundary score that fuses neighborhood uncertainty, class ambiguity, and temporal predictive dynamics. Importantly, our annotation protocol requests labels for only the boundary frames while still training on boundary-centered clips to exploit temporal context through the model's receptive field. Extensive experiments on GTEA, 50Salads, and Breakfast demonstrate that boundary-centric supervision delivers strong label efficiency and consistently surpasses representative TAS active learning baselines and prior state of the art under sparse budgets, with the largest gains on datasets where boundary placement dominates edit and overlap-based F1 scores.

Problem

Research questions and friction points this paper is trying to address.

Temporal Action Segmentation

Boundary Annotation

Label Efficiency

Action Transitions

Active Learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

boundary-centric active learning

temporal action segmentation

annotation efficiency