🤖 AI Summary
This work addresses the modeling challenge of micro-gestures (MGs) in identity-agnostic affective intelligence. It introduces the first systematic definition of MG-based affective semantics, departing from conventional action recognition paradigms. Methodologically, we propose a plug-and-play spatio-temporal balanced fusion module and establish a micro-pose-aware enhancement strategy synergized with large language models for affective reasoning. Our contributions are threefold: (1) We uncover, for the first time, the unique semantic value of MGs in fine-grained, unconscious affective expression; (2) Our approach achieves state-of-the-art performance on MG recognition with strong cross-dataset generalization; (3) It significantly improves the completeness and depth of affective understanding, enabling downstream applications such as deception detection.
📝 Abstract
In this work, we focus on a special group of human body language -- the micro-gesture (MG), which differs from the range of ordinary illustrative gestures in that they are not intentional behaviors performed to convey information to others, but rather unintentional behaviors driven by inner feelings. This characteristic introduces two novel challenges regarding micro-gestures that are worth rethinking. The first is whether strategies designed for other action recognition are entirely applicable to micro-gestures. The second is whether micro-gestures, as supplementary data, can provide additional insights for emotional understanding. In recognizing micro-gestures, we explored various augmentation strategies that take into account the subtle spatial and brief temporal characteristics of micro-gestures, often accompanied by repetitiveness, to determine more suitable augmentation methods. Considering the significance of temporal domain information for micro-gestures, we introduce a simple and efficient plug-and-play spatiotemporal balancing fusion method. We not only studied our method on the considered micro-gesture dataset but also conducted experiments on mainstream action datasets. The results show that our approach performs well in micro-gesture recognition and on other datasets, achieving state-of-the-art performance compared to previous micro-gesture recognition methods. For emotional understanding based on micro-gestures, we construct complex emotional reasoning scenarios. Our evaluation, conducted with large language models, shows that micro-gestures play a significant and positive role in enhancing comprehensive emotional understanding. The scenarios we developed can be extended to other micro-gesture-based tasks such as deception detection and interviews. We confirm that our new insights contribute to advancing research in micro-gesture and emotional artificial intelligence.