Beyond conventional vision: RGB-event fusion for robust object detection in dynamic traffic scenarios

📅 2025-08-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the degradation of RGB-based object detection performance in complex traffic scenarios (e.g., nighttime, tunnels) caused by limited dynamic range and consequent detail loss, this paper proposes a motion-aware fusion network that leverages the high dynamic range of event cameras. Methodologically, it introduces an event correction and dynamic upsampling module for cross-modal spatiotemporal alignment, and a cross-scan multimodal Mamba fusion module that adaptively integrates complementary RGB and event features. The key innovations lie in the integration of optical flow-based alignment, event-driven dynamic upsampling, and state-space modeling (Mamba) into a unified multimodal detection framework. Extensive experiments demonstrate substantial improvements over state-of-the-art methods on DSEC-Det and PKU-DAVIS-SOD benchmarks: on DSEC-Det, the proposed method achieves +7.4% mAP₅₀ and +1.7% mAP.

Technology Category

Application Category

📝 Abstract
The dynamic range limitation of conventional RGB cameras reduces global contrast and causes loss of high-frequency details such as textures and edges in complex traffic environments (e.g., nighttime driving, tunnels), hindering discriminative feature extraction and degrading frame-based object detection. To address this, we integrate a bio-inspired event camera with an RGB camera to provide high dynamic range information and propose a motion cue fusion network (MCFNet), which achieves optimal spatiotemporal alignment and adaptive cross-modal feature fusion under challenging lighting. Specifically, an event correction module (ECM) temporally aligns asynchronous event streams with image frames via optical-flow-based warping, jointly optimized with the detection network to learn task-aware event representations. The event dynamic upsampling module (EDUM) enhances spatial resolution of event frames to match image structures, ensuring precise spatiotemporal alignment. The cross-modal mamba fusion module (CMM) uses adaptive feature fusion with a novel interlaced scanning mechanism, effectively integrating complementary information for robust detection. Experiments conducted on the DSEC-Det and PKU-DAVIS-SOD datasets demonstrate that MCFNet significantly outperforms existing methods in various poor lighting and fast moving traffic scenarios. Notably, on the DSEC-Det dataset, MCFNet achieves a remarkable improvement, surpassing the best existing methods by 7.4% in mAP50 and 1.7% in mAP metrics, respectively. The code is available at https://github.com/Charm11492/MCFNet.
Problem

Research questions and friction points this paper is trying to address.

Overcoming RGB camera limitations in dynamic traffic scenarios
Integrating event cameras for high dynamic range information
Improving object detection in poor lighting and fast motion
Innovation

Methods, ideas, or system contributions that make the work stand out.

RGB-event fusion for dynamic traffic detection
Optical-flow-based event-image alignment
Adaptive cross-modal feature fusion
🔎 Similar Papers
No similar papers found.
Z
Zhanwen Liu
School of Information Engineering, Chang’an University, Xi’an, 710000, China
Yujing Sun
Yujing Sun
Nanyang Technological University
AI Security & 3DV
Y
Yang Wang
School of Information Engineering, Chang’an University, Xi’an, 710000, China
N
Nan Yang
School of Information Engineering, Chang’an University, Xi’an, 710000, China
S
Shengbo Eben Li
School of Vehicle Mobility & College of AI, Tsinghua University, Beijing, 100084, China
X
Xiangmo Zhao
School of Information Engineering, Chang’an University, Xi’an, 710000, China