Hybrid-supervised Hypergraph-enhanced Transformer for Micro-gesture Based Emotion Recognition

📅 2025-07-20

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Prior work on micro-gesture emotion recognition remains limited, particularly in modeling fine-grained affective dynamics from skeletal sequences. Method: This paper proposes a hypergraph-enhanced Transformer framework—the first to introduce hypergraph modeling for skeleton-based micro-emotion analysis. It features a hypergraph self-attention module with progressively updated hyperedges to explicitly capture high-order, time-varying joint interactions; integrates multi-scale temporal convolutions and a self-supervised reconstruction decoder to precisely encode subtle motion patterns of micro-gestures; and enables end-to-end joint optimization of the emotion classification head and reconstruction task within the encoder. Results: Evaluated on iMiGUE and SMG benchmarks, our method achieves state-of-the-art performance, significantly outperforming existing approaches in accuracy, macro-F1, and other key metrics—demonstrating the efficacy of hypergraph structures for modeling micro-expression-level emotional states.

Technology Category

Application Category

📝 Abstract

Micro-gestures are unconsciously performed body gestures that can convey the emotion states of humans and start to attract more research attention in the fields of human behavior understanding and affective computing as an emerging topic. However, the modeling of human emotion based on micro-gestures has not been explored sufficiently. In this work, we propose to recognize the emotion states based on the micro-gestures by reconstructing the behavior patterns with a hypergraph-enhanced Transformer in a hybrid-supervised framework. In the framework, hypergraph Transformer based encoder and decoder are separately designed by stacking the hypergraph-enhanced self-attention and multiscale temporal convolution modules. Especially, to better capture the subtle motion of micro-gestures, we construct a decoder with additional upsampling operations for a reconstruction task in a self-supervised learning manner. We further propose a hypergraph-enhanced self-attention module where the hyperedges between skeleton joints are gradually updated to present the relationships of body joints for modeling the subtle local motion. Lastly, for exploiting the relationship between the emotion states and local motion of micro-gestures, an emotion recognition head from the output of encoder is designed with a shallow architecture and learned in a supervised way. The end-to-end framework is jointly trained in a one-stage way by comprehensively utilizing self-reconstruction and supervision information. The proposed method is evaluated on two publicly available datasets, namely iMiGUE and SMG, and achieves the best performance under multiple metrics, which is superior to the existing methods.

Problem

Research questions and friction points this paper is trying to address.

Recognize emotion states using micro-gestures via hybrid-supervised framework

Model subtle local motion with hypergraph-enhanced self-attention modules

Improve performance on micro-gesture datasets iMiGUE and SMG

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid-supervised hypergraph-enhanced Transformer framework

Self-supervised decoder with upsampling for motion capture

Hypergraph self-attention for subtle joint motion modeling

🔎 Similar Papers

SpotFormer: Multi-Scale Spatio-Temporal Transformer for Facial Expression Spotting