🤖 AI Summary
To address the degradation of RGB-based object detection performance in complex traffic scenarios (e.g., nighttime, tunnels) caused by limited dynamic range and consequent detail loss, this paper proposes a motion-aware fusion network that leverages the high dynamic range of event cameras. Methodologically, it introduces an event correction and dynamic upsampling module for cross-modal spatiotemporal alignment, and a cross-scan multimodal Mamba fusion module that adaptively integrates complementary RGB and event features. The key innovations lie in the integration of optical flow-based alignment, event-driven dynamic upsampling, and state-space modeling (Mamba) into a unified multimodal detection framework. Extensive experiments demonstrate substantial improvements over state-of-the-art methods on DSEC-Det and PKU-DAVIS-SOD benchmarks: on DSEC-Det, the proposed method achieves +7.4% mAP₅₀ and +1.7% mAP.
📝 Abstract
The dynamic range limitation of conventional RGB cameras reduces global contrast and causes loss of high-frequency details such as textures and edges in complex traffic environments (e.g., nighttime driving, tunnels), hindering discriminative feature extraction and degrading frame-based object detection. To address this, we integrate a bio-inspired event camera with an RGB camera to provide high dynamic range information and propose a motion cue fusion network (MCFNet), which achieves optimal spatiotemporal alignment and adaptive cross-modal feature fusion under challenging lighting. Specifically, an event correction module (ECM) temporally aligns asynchronous event streams with image frames via optical-flow-based warping, jointly optimized with the detection network to learn task-aware event representations. The event dynamic upsampling module (EDUM) enhances spatial resolution of event frames to match image structures, ensuring precise spatiotemporal alignment. The cross-modal mamba fusion module (CMM) uses adaptive feature fusion with a novel interlaced scanning mechanism, effectively integrating complementary information for robust detection. Experiments conducted on the DSEC-Det and PKU-DAVIS-SOD datasets demonstrate that MCFNet significantly outperforms existing methods in various poor lighting and fast moving traffic scenarios. Notably, on the DSEC-Det dataset, MCFNet achieves a remarkable improvement, surpassing the best existing methods by 7.4% in mAP50 and 1.7% in mAP metrics, respectively. The code is available at https://github.com/Charm11492/MCFNet.