CGTGait: Collaborative Graph and Transformer for Gait Emotion Recognition

📅 2025-09-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient long-range temporal modeling in skeleton-based gait emotion recognition, this paper proposes a dual-stream architecture—Collaborative Graph Convolution and Transformer (CGT). The method integrates lightweight CGT modules with a bidirectional cross-stream fusion mechanism to jointly model joint-level spatial topology and global temporal dependencies, thereby enhancing discriminative spatiotemporal feature representation while reducing computational overhead. Evaluated on Emotion-Gait and ELMD datasets, CGT achieves state-of-the-art or competitive accuracy, with only 0.34G FLOPs inference cost—82.2% lower than baseline methods—significantly improving the efficiency–accuracy trade-off. The core contribution lies in the first effective integration of graph convolutional networks and Transformers for gait emotion recognition, and empirical validation that bidirectional cross-stream fusion is critical for complementary spatiotemporal feature learning.

Technology Category

Application Category

📝 Abstract
Skeleton-based gait emotion recognition has received significant attention due to its wide-ranging applications. However, existing methods primarily focus on extracting spatial and local temporal motion information, failing to capture long-range temporal representations. In this paper, we propose extbf{CGTGait}, a novel framework that collaboratively integrates graph convolution and transformers to extract discriminative spatiotemporal features for gait emotion recognition. Specifically, CGTGait consists of multiple CGT blocks, where each block employs graph convolution to capture frame-level spatial topology and the transformer to model global temporal dependencies. Additionally, we introduce a Bidirectional Cross-Stream Fusion (BCSF) module to effectively aggregate posture and motion spatiotemporal features, facilitating the exchange of complementary information between the two streams. We evaluate our method on two widely used datasets, Emotion-Gait and ELMD, demonstrating that our CGTGait achieves state-of-the-art or at least competitive performance while reducing computational complexity by approximately extbf{82.2%} (only requiring 0.34G FLOPs) during testing. Code is available at small{https://github.com/githubzjj1/CGTGait.}
Problem

Research questions and friction points this paper is trying to address.

Recognizing emotions from skeleton-based gait data
Capturing long-range temporal dependencies in gait sequences
Reducing computational complexity in gait emotion recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph convolution captures frame-level spatial topology
Transformer models global temporal dependencies effectively
Bidirectional module fuses posture and motion features
🔎 Similar Papers
No similar papers found.
Junjie Zhou
Junjie Zhou
Nanjing University
Computer VisionMachine Learning
H
Haijun Xiong
Huazhong University of Science and Technology, China
J
Junhao Lu
Hefei University of Technology, China
Ziyu Lin
Ziyu Lin
National University of Singapore, Singapore Management University
Network Security/Web Security/System Security
B
Bin Feng
Huazhong University of Science and Technology, China