Improving Temporal Action Segmentation via Constraint-Aware Decoding

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This work addresses the challenges of temporal action segmentation—particularly abrupt action transitions, ambiguous boundaries, and high annotation costs—especially under low-resource or novel-domain settings. To this end, the authors propose a lightweight constraint-aware decoding framework that requires neither retraining nor increased model complexity during inference. By integrating structural priors derived from annotated data, including transition confidence, action boundary distributions, and class duration statistics, the framework employs an enhanced Viterbi algorithm to efficiently rectify structured prediction errors. Compatible with both fully and semi-supervised models, the approach significantly improves segmentation accuracy while maintaining high computational efficiency.

📝 Abstract

Temporal action segmentation (TAS) divides untrimmed videos into labeled action segments. While fully supervised methods have advanced the field, challenges such as action variability, ambiguous boundaries, and high annotation costs remain, especially in new or low-resource domains. Grammar-based approaches improve segmentation with structural priors but rely on complex parsing limiting scalability. In this work, we propose a lightweight, constraint-based refinement framework that enhances TAS predictions by integrating statistical structural priors such as transition confidence, action boundary sets, and per-class duration, that can be directly extracted from annotated data. These constraints are integrated into a modified Viterbi decoding algorithm, allowing inference-time refinement without retraining or added model complexity. Our approach improves both fully and semi-supervised TAS models by correcting structural prediction errors while maintaining high efficiency. Code is available at https://github.com/LUNAProject22/CAD

Problem

Research questions and friction points this paper is trying to address.

Temporal Action Segmentation

Action Variability

Ambiguous Boundaries

Annotation Costs

Structural Priors

Innovation

Methods, ideas, or system contributions that make the work stand out.

constraint-aware decoding

temporal action segmentation

structural priors