🤖 AI Summary
This work addresses the performance degradation in micro-gesture recognition under low-data regimes, noisy conditions, and cross-subject settings—challenges arising from the gestures’ small amplitude, short duration, and high inter-individual variability. To tackle these issues, the authors propose an explainable recognition paradigm grounded in active inference. The model leverages Expected Free Energy (EFE) to dynamically select the most discriminative temporal segments and incorporates a prediction-uncertainty-driven adaptive learning mechanism to optimize information acquisition while mitigating the adverse effects of label noise and distribution shifts. Evaluated on the SMG dataset, the proposed approach significantly enhances the performance of multiple mainstream backbone networks. Ablation studies further confirm the effectiveness and necessity of both EFE-guided sampling and uncertainty-weighted learning components.
📝 Abstract
Micro-gestures are subtle and transient movements triggered by unconscious neural and emotional activities, holding great potential for human-computer interaction and clinical monitoring. However, their low amplitude, short duration, and strong inter-subject variability make existing deep models prone to degradation under low-sample, noisy, and cross-subject conditions. This paper presents an active inference-based framework for micro-gesture recognition, featuring Expected Free Energy (EFE)-guided temporal sampling and uncertainty-aware adaptive learning. The model actively selects the most discriminative temporal segments under EFE guidance, enabling dynamic observation and information gain maximization. Meanwhile, sample weighting driven by predictive uncertainty mitigates the effects of label noise and distribution shift. Experiments on the SMG dataset demonstrate the effectiveness of the proposed method, achieving consistent improvements across multiple mainstream backbones. Ablation studies confirm that both the EFE-guided observation and the adaptive learning mechanism are crucial to the performance gains. This work offers an interpretable and scalable paradigm for temporal behavior modeling under low-resource and noisy conditions, with broad applicability to wearable sensing, HCI, and clinical emotion monitoring.