🤖 AI Summary
To address insufficient long-range temporal modeling in skeleton-based gait emotion recognition, this paper proposes a dual-stream architecture—Collaborative Graph Convolution and Transformer (CGT). The method integrates lightweight CGT modules with a bidirectional cross-stream fusion mechanism to jointly model joint-level spatial topology and global temporal dependencies, thereby enhancing discriminative spatiotemporal feature representation while reducing computational overhead. Evaluated on Emotion-Gait and ELMD datasets, CGT achieves state-of-the-art or competitive accuracy, with only 0.34G FLOPs inference cost—82.2% lower than baseline methods—significantly improving the efficiency–accuracy trade-off. The core contribution lies in the first effective integration of graph convolutional networks and Transformers for gait emotion recognition, and empirical validation that bidirectional cross-stream fusion is critical for complementary spatiotemporal feature learning.
📝 Abstract
Skeleton-based gait emotion recognition has received significant attention due to its wide-ranging applications. However, existing methods primarily focus on extracting spatial and local temporal motion information, failing to capture long-range temporal representations. In this paper, we propose extbf{CGTGait}, a novel framework that collaboratively integrates graph convolution and transformers to extract discriminative spatiotemporal features for gait emotion recognition. Specifically, CGTGait consists of multiple CGT blocks, where each block employs graph convolution to capture frame-level spatial topology and the transformer to model global temporal dependencies. Additionally, we introduce a Bidirectional Cross-Stream Fusion (BCSF) module to effectively aggregate posture and motion spatiotemporal features, facilitating the exchange of complementary information between the two streams. We evaluate our method on two widely used datasets, Emotion-Gait and ELMD, demonstrating that our CGTGait achieves state-of-the-art or at least competitive performance while reducing computational complexity by approximately extbf{82.2%} (only requiring 0.34G FLOPs) during testing. Code is available at small{https://github.com/githubzjj1/CGTGait.}