LSTC-MDA: A Unified Framework for Long-Short Term Temporal Convolution and Mixed Data Augmentation in Skeleton-Based Action Recognition

📅 2025-09-18

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Skeleton-based action recognition has long suffered from scarce labeled data and challenges in modeling short- and long-term temporal dependencies. To address these issues, this paper proposes LSTC-MDA, a unified framework featuring two core innovations: (1) a Long-Short-Term Parallel Temporal Convolution (LSTC) module that adaptively fuses multi-scale temporal features; and (2) a view-aware hybrid data augmentation strategy, integrating joint-level mixed data augmentation (JMDA) with input-layer Additive Mixup, enhanced by a similarity-weighted alignment mechanism to mitigate distribution shift caused by cross-view mixing. Evaluated on NTU-60, NTU-120, and NW-UCLA benchmarks, LSTC-MDA achieves state-of-the-art accuracies of 94.1%, 90.4%, and 97.2%, respectively—outperforming all existing methods. The framework establishes a novel paradigm for few-shot temporal modeling in skeleton-based action recognition.

Technology Category

Application Category

📝 Abstract

Skeleton-based action recognition faces two longstanding challenges: the scarcity of labeled training samples and difficulty modeling short- and long-range temporal dependencies. To address these issues, we propose a unified framework, LSTC-MDA, which simultaneously improves temporal modeling and data diversity. We introduce a novel Long-Short Term Temporal Convolution (LSTC) module with parallel short- and long-term branches, these two feature branches are then aligned and fused adaptively using learned similarity weights to preserve critical long-range cues lost by conventional stride-2 temporal convolutions. We also extend Joint Mixing Data Augmentation (JMDA) with an Additive Mixup at the input level, diversifying training samples and restricting mixup operations to the same camera view to avoid distribution shifts. Ablation studies confirm each component contributes. LSTC-MDA achieves state-of-the-art results: 94.1% and 97.5% on NTU 60 (X-Sub and X-View), 90.4% and 92.0% on NTU 120 (X-Sub and X-Set),97.2% on NW-UCLA. Code: https://github.com/xiaobaoxia/LSTC-MDA.

Problem

Research questions and friction points this paper is trying to address.

Addressing scarcity of labeled training samples

Modeling short- and long-range temporal dependencies

Improving temporal modeling and data diversity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Long-Short Term Temporal Convolution module

Adaptive fusion with learned similarity weights

Joint Mixing Data Augmentation with Additive Mixup

🔎 Similar Papers

No similar papers found.