🤖 AI Summary
This work addresses the challenges of weak discriminability between adjacent actions and ambiguous boundaries in skeleton-based temporal action segmentation by introducing frequency-domain analysis into this task for the first time. The proposed frequency-aware modeling paradigm features an adaptive multi-scale spectral filter—acting as a “scalpel” to precisely modulate frequency components—an inter-action difference loss to enhance inter-class separability, and a frequency-aware channel mixer. By effectively suppressing shared frequencies across actions while amplifying action-specific spectral signatures, the method significantly sharpens action boundaries. Extensive experiments on five public datasets demonstrate state-of-the-art performance, substantially alleviating boundary localization ambiguity and inter-class confusion.
📝 Abstract
Skeleton-based Temporal Action Segmentation (STAS) seeks to densely segment and classify diverse actions within long, untrimmed skeletal motion sequences. However, existing STAS methodologies face challenges of limited inter-class discriminability and blurred segmentation boundaries, primarily due to insufficient distinction of spatio-temporal patterns between adjacent actions. To address these limitations, we propose Spectral Scalpel, a frequency-selective filtering framework aimed at suppressing shared frequency components between adjacent distinct actions while amplifying their action-specific frequencies, thereby enhancing inter-action discrepancies and sharpening transition boundaries. Specifically, Spectral Scalpel employs adaptive multi-scale spectral filters as scalpels to edit frequency spectra, coupled with a discrepancy loss between adjacent actions serving as the surgical objective. This design amplifies representational disparities between neighboring actions, effectively mitigating boundary localization ambiguities and inter-class confusion. Furthermore, complementing long-term temporal modeling, we introduce a frequency-aware channel mixer to strengthen channel evolution by aggregating spectra across channels. This work presents a novel paradigm for STAS that extends conventional spatio-temporal modeling by incorporating frequency-domain analysis. Extensive experiments on five public datasets demonstrate that Spectral Scalpel achieves state-of-the-art performance. Code is available at https://github.com/HaoyuJi/SpecScalpel.