🤖 AI Summary
This work addresses the challenge of robust human action recognition under extremely low-light conditions and severe six-degree-of-freedom camera motion, where conventional vision-based methods typically fail. To overcome this limitation, the authors propose EIS-HAR, a novel framework that fuses event camera data with inertial measurement unit (IMU) signals. The approach introduces a motion-compensated nonlinear warped reconstruction module and a four-stage hybrid spatiotemporal feature extraction network. Furthermore, the study presents DarkShake-DVS, the first large-scale benchmark dataset encompassing low illumination, intense camera shake, and synchronized IMU recordings, along with an efficient joint event-IMU motion compensation mechanism. Experimental results demonstrate that EIS-HAR significantly outperforms existing methods across three datasets, confirming its superior performance and generalization capability under extreme imaging conditions.
📝 Abstract
Human Action Recognition (HAR) is a fundamental computer vision task with diverse real-world applications. Practical deployments often involve low-light environments and unconstrained 6-DoF camera motion, conditions that degrade visual quality, disrupt temporal coherence, and compromise reliability of existing methods. Event cameras, with high low-light sensitivity and microsecond-level temporal resolution, paired with an inertial measurement unit (IMU), present a promising solution. However, current research faces two key challenges: absence of a benchmark integrating low-light conditions, 6-DoF motion, and synchronized IMU data; and lack of effective motion compensation techniques. To address these, we propose Event-IMU Stabilized HAR (EIS-HAR), with two modules. The first is an EIS module that reduces motion blur via a non-linear warping function to reconstruct a motion-compensated input. The second is a HAR module with a four-stage hybrid architecture to efficiently extract spatiotemporal features for accurate action recognition. To alleviate data scarcity, we introduce DarkShake-DVS, the first large-scale event-based HAR benchmark that includes 18,041 realworld clips captured in low light and intense 6-DoF motion, supplemented by synchronized IMU data. Extensive experiments on three datasets demonstrate consistent superiority of EIS-HAR over state-of-the-art methods.