🤖 AI Summary
This work addresses the limitations of multimodal 3D perception in challenging conditions—such as low light, long range, and occlusion—where 4D radar point clouds are sparse and visual data suffer from unreliable texture. To overcome these issues, the authors propose the SDCM framework, which first densifies radar point clouds by integrating Gaussian modeling with curvature-aware simulation. A Radar Compensation Mapping (RCM) module is then introduced to mitigate visual degradation, followed by a Mamba-based architecture for efficient, low-parameter heterogeneous modality interaction and fusion (MMIF). Evaluated on VoD, TJ4DRadSet, and Astyx HiRes 2019 benchmarks, the proposed method achieves state-of-the-art performance while significantly reducing model parameters and accelerating inference speed.
📝 Abstract
3-D object detection based on 4-D radar-vision is an important part in Internet of Vehicles (IoV). However, there are two challenges which need to be faced. First, the 4-D radar point clouds are sparse, leading to poor 3-D representation. Second, vision datas exhibit representation degradation under low-light, long distance detection and dense occlusion scenes, which provides unreliable texture information during fusion stage. To address these issues, a framework named SDCM is proposed, which contains Simulated Densifying and Compensatory Modeling Fusion for radar-vision 3-D object detection in IoV. Firstly, considering point generation based on Gaussian simulation of key points obtained from 3-D Kernel Density Estimation (3-D KDE), and outline generation based on curvature simulation, Simulated Densifying (SimDen) module is designed to generate dense radar point clouds. Secondly, considering that radar data could provide more real time information than vision data, due to the all-weather property of 4-D radar. Radar Compensatory Mapping (RCM) module is designed to reduce the affects of vision datas'representation degradation. Thirdly, considering that feature tensor difference values contain the effective information of every modality, which could be extracted and modeled for heterogeneity reduction and modalities interaction, Mamba Modeling Interactive Fusion (MMIF) module is designed for reducing heterogeneous and achieving interactive Fusion. Experiment results on the VoD, TJ4DRadSet and Astyx HiRes 2019 dataset show that SDCM achieves best performance with lower parameter quantity and faster inference speed. Our code will be available.