CGTGait: Collaborative Graph and Transformer for Gait Emotion Recognition

📅 2025-09-20

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

To address insufficient long-range temporal modeling in skeleton-based gait emotion recognition, this paper proposes a dual-stream architecture—Collaborative Graph Convolution and Transformer (CGT). The method integrates lightweight CGT modules with a bidirectional cross-stream fusion mechanism to jointly model joint-level spatial topology and global temporal dependencies, thereby enhancing discriminative spatiotemporal feature representation while reducing computational overhead. Evaluated on Emotion-Gait and ELMD datasets, CGT achieves state-of-the-art or competitive accuracy, with only 0.34G FLOPs inference cost—82.2% lower than baseline methods—significantly improving the efficiency–accuracy trade-off. The core contribution lies in the first effective integration of graph convolutional networks and Transformers for gait emotion recognition, and empirical validation that bidirectional cross-stream fusion is critical for complementary spatiotemporal feature learning.

Technology Category

Application Category

📝 Abstract

Skeleton-based gait emotion recognition has received significant attention due to its wide-ranging applications. However, existing methods primarily focus on extracting spatial and local temporal motion information, failing to capture long-range temporal representations. In this paper, we propose extbf{CGTGait}, a novel framework that collaboratively integrates graph convolution and transformers to extract discriminative spatiotemporal features for gait emotion recognition. Specifically, CGTGait consists of multiple CGT blocks, where each block employs graph convolution to capture frame-level spatial topology and the transformer to model global temporal dependencies. Additionally, we introduce a Bidirectional Cross-Stream Fusion (BCSF) module to effectively aggregate posture and motion spatiotemporal features, facilitating the exchange of complementary information between the two streams. We evaluate our method on two widely used datasets, Emotion-Gait and ELMD, demonstrating that our CGTGait achieves state-of-the-art or at least competitive performance while reducing computational complexity by approximately extbf{82.2%} (only requiring 0.34G FLOPs) during testing. Code is available at small{https://github.com/githubzjj1/CGTGait.}

Problem

Research questions and friction points this paper is trying to address.

Recognizing emotions from skeleton-based gait data

Capturing long-range temporal dependencies in gait sequences

Reducing computational complexity in gait emotion recognition

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph convolution captures frame-level spatial topology

Transformer models global temporal dependencies effectively

Bidirectional module fuses posture and motion features

🔎 Similar Papers

EmT: A Novel Transformer for Generalized Cross-subject EEG Emotion Recognition