🤖 AI Summary
To address temporal misalignment of point clouds and insufficient dynamic information exploitation in multi-frame 4D radar–LiDAR fusion for moving objects, this paper proposes a Motion-aware Radar Encoder (MRE) that explicitly corrects temporal offsets in radar point clouds, and a Motion-Attention Gated Fusion (MAGF) module that leverages velocity features to steer cross-modal attention toward dynamic foreground regions. Furthermore, we introduce a multi-frame radar–LiDAR feature alignment mechanism coupled with cross-attention to achieve spatiotemporally consistent feature fusion. Evaluated on the View-of-Delft dataset, our method achieves state-of-the-art performance: 73.30% mAP over the full evaluation area and 88.68% mAP in the driving corridor; pedestrian AP reaches 69.67% (full area), while bicycle AP attains 96.25% (driving corridor). These results demonstrate significant improvements in robust motion-object detection for autonomous driving scenarios.
📝 Abstract
Reliable autonomous driving systems require accurate detection of traffic participants. To this end, multi-modal fusion has emerged as an effective strategy. In particular, 4D radar and LiDAR fusion methods based on multi-frame radar point clouds have demonstrated the effectiveness in bridging the point density gap. However, they often neglect radar point clouds' inter-frame misalignment caused by object movement during accumulation and do not fully exploit the object dynamic information from 4D radar. In this paper, we propose MoRAL, a motion-aware multi-frame 4D radar and LiDAR fusion framework for robust 3D object detection. First, a Motion-aware Radar Encoder (MRE) is designed to compensate for inter-frame radar misalignment from moving objects. Later, a Motion Attention Gated Fusion (MAGF) module integrate radar motion features to guide LiDAR features to focus on dynamic foreground objects. Extensive evaluations on the View-of-Delft (VoD) dataset demonstrate that MoRAL outperforms existing methods, achieving the highest mAP of 73.30% in the entire area and 88.68% in the driving corridor. Notably, our method also achieves the best AP of 69.67% for pedestrians in the entire area and 96.25% for cyclists in the driving corridor.