🤖 AI Summary
Dynamic blur in event camera imaging—caused by camera or scene motion—remains challenging due to uneven event distribution and cross-modal redundancy, leading to limited deblurring accuracy and poor detail recovery. To address this, we propose a bio-inspired dual-driven hybrid network that jointly leverages asynchronous event streams and RGB frames. Our approach introduces two novel modules: (i) a Neuron Configuration Module (NCM), enabling adaptive spatiotemporal neuron allocation, and (ii) an unsupervised Region-Based Attention Module (RBAM), inspired by human visual attention, for selective cross-modal feature focusing and fusion. Temporal event modeling employs Spiking Neural Networks (SNNs), while RGB processing uses conventional Artificial Neural Networks (ANNs). Evaluated on both synthetic and real-world datasets, our method outperforms all state-of-the-art approaches across objective metrics (e.g., PSNR, SSIM) and subjective quality, with particularly notable improvements in edge sharpness and fine-texture reconstruction.
📝 Abstract
Motion deblurring addresses the challenge of image blur caused by camera or scene movement. Event cameras provide motion information that is encoded in the asynchronous event streams. To efficiently leverage the temporal information of event streams, we employ Spiking Neural Networks (SNNs) for motion feature extraction and Artificial Neural Networks (ANNs) for color information processing. Due to the non-uniform distribution and inherent redundancy of event data, existing cross-modal feature fusion methods exhibit certain limitations. Inspired by the visual attention mechanism in the human visual system, this study introduces a bioinspired dual-drive hybrid network (BDHNet). Specifically, the Neuron Configurator Module (NCM) is designed to dynamically adjusts neuron configurations based on cross-modal features, thereby focusing the spikes in blurry regions and adapting to varying blurry scenarios dynamically. Additionally, the Region of Blurry Attention Module (RBAM) is introduced to generate a blurry mask in an unsupervised manner, effectively extracting motion clues from the event features and guiding more accurate cross-modal feature fusion. Extensive subjective and objective evaluations demonstrate that our method outperforms current state-of-the-art methods on both synthetic and real-world datasets.