🤖 AI Summary
To address the challenge of semantic segmentation for 4D radar data—stemming from the absence of pixel-level ground truth labels—this paper proposes a LiDAR-camera fusion-based cross-modal automatic annotation framework that generates high-quality voxel-level ground truth. It further introduces the first end-to-end semantic segmentation model that directly consumes raw 4D radar tensors, implemented via a lightweight 3D convolutional network augmented with Chamfer distance loss to improve voxel localization accuracy. Key contributions include: (1) the first end-to-end tensor-based semantic segmentation model specifically designed for 4D radar; (2) the first multimodal automatic annotation paradigm to resolve the radar ground truth scarcity problem; and (3) state-of-the-art performance on the RaDelft dataset, achieving a 13.2% improvement in vehicle detection probability, a 0.54 m reduction in voxel localization error, and segmentation accuracy exceeding 65% of the LiDAR-based benchmark.
📝 Abstract
In this paper, an automatic labelling process is presented for automotive datasets, leveraging on complementary information from LiDAR and camera. The generated labels are then used as ground truth with the corresponding 4D radar data as inputs to a proposed semantic segmentation network, to associate a class label to each spatial voxel. Promising results are shown by applying both approaches to the publicly shared RaDelft dataset, with the proposed network achieving over 65% of the LiDAR detection performance, improving 13.2% in vehicle detection probability, and reducing 0.54 m in terms of Chamfer distance, compared to variants inspired from the literature.