🤖 AI Summary
To address the low temporal resolution and motion-blur susceptibility of RGB cameras in micro-expression analysis, coupled with the scarcity of event-camera micro-expression data, this work introduces the first synchronized RGB-event multimodal micro-expression dataset featuring multi-resolution capabilities. We further propose a spiking neural network (SNN)-based action unit (AU) classification framework and a conditional variational autoencoder (CVAE)-driven high-fidelity frame reconstruction method. Experimental results demonstrate that the event-based AU classification achieves 51.23% accuracy—substantially outperforming the RGB baseline (23.12%). The reconstructed frames attain SSIM = 0.8513 and PSNR = 26.89 dB. This study provides the first systematic validation of event cameras for modeling the spatiotemporal dynamics of micro-expressions, establishing a novel paradigm for high-accuracy, motion-robust micro-expression recognition and interpretable, high-fidelity reconstruction.
📝 Abstract
Micro-expression analysis has applications in domains such as Human-Robot Interaction and Driver Monitoring Systems. Accurately capturing subtle and fast facial movements remains difficult when relying solely on RGB cameras, due to limitations in temporal resolution and sensitivity to motion blur. Event cameras offer an alternative, with microsecond-level precision, high dynamic range, and low latency. However, public datasets featuring event-based recordings of Action Units are still scarce. In this work, we introduce a novel, preliminary multi-resolution and multi-modal micro-expression dataset recorded with synchronized RGB and event cameras under variable lighting conditions. Two baseline tasks are evaluated to explore the spatial-temporal dynamics of micro-expressions: Action Unit classification using Spiking Neural Networks (51.23% accuracy with events vs. 23.12% with RGB), and frame reconstruction using Conditional Variational Autoencoders, achieving SSIM = 0.8513 and PSNR = 26.89 dB with high-resolution event input. These promising results show that event-based data can be used for micro-expression recognition and frame reconstruction.