Exploring Spatial-Temporal Dynamics in Event-based Facial Micro-Expression Analysis

📅 2025-08-16

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

To address the low temporal resolution and motion-blur susceptibility of RGB cameras in micro-expression analysis, coupled with the scarcity of event-camera micro-expression data, this work introduces the first synchronized RGB-event multimodal micro-expression dataset featuring multi-resolution capabilities. We further propose a spiking neural network (SNN)-based action unit (AU) classification framework and a conditional variational autoencoder (CVAE)-driven high-fidelity frame reconstruction method. Experimental results demonstrate that the event-based AU classification achieves 51.23% accuracy—substantially outperforming the RGB baseline (23.12%). The reconstructed frames attain SSIM = 0.8513 and PSNR = 26.89 dB. This study provides the first systematic validation of event cameras for modeling the spatiotemporal dynamics of micro-expressions, establishing a novel paradigm for high-accuracy, motion-robust micro-expression recognition and interpretable, high-fidelity reconstruction.

Technology Category

Application Category

📝 Abstract

Micro-expression analysis has applications in domains such as Human-Robot Interaction and Driver Monitoring Systems. Accurately capturing subtle and fast facial movements remains difficult when relying solely on RGB cameras, due to limitations in temporal resolution and sensitivity to motion blur. Event cameras offer an alternative, with microsecond-level precision, high dynamic range, and low latency. However, public datasets featuring event-based recordings of Action Units are still scarce. In this work, we introduce a novel, preliminary multi-resolution and multi-modal micro-expression dataset recorded with synchronized RGB and event cameras under variable lighting conditions. Two baseline tasks are evaluated to explore the spatial-temporal dynamics of micro-expressions: Action Unit classification using Spiking Neural Networks (51.23% accuracy with events vs. 23.12% with RGB), and frame reconstruction using Conditional Variational Autoencoders, achieving SSIM = 0.8513 and PSNR = 26.89 dB with high-resolution event input. These promising results show that event-based data can be used for micro-expression recognition and frame reconstruction.

Problem

Research questions and friction points this paper is trying to address.

Capturing subtle facial movements with RGB cameras is challenging

Lack of public datasets for event-based micro-expression analysis

Exploring spatial-temporal dynamics in micro-expressions using event cameras

Innovation

Methods, ideas, or system contributions that make the work stand out.

Event cameras for microsecond-level precision

Multi-resolution multi-modal dataset creation

Spiking Neural Networks for AU classification

🔎 Similar Papers

Exploring Facial Biomarkers for Depression through Temporal Analysis of Action Units