🤖 AI Summary
This work addresses the challenge of limited real-world millimeter-wave radar data—scarce and costly to annotate—for large-scale training. The authors propose an end-to-end framework that synthesizes realistic radar point clouds from a single RGB image. By leveraging monocular depth estimation and semantic segmentation to reconstruct the 3D scene, and employing a vision-language model to infer object material properties, the method drives a physics-based ray-tracing simulation grounded in Fresnel reflection principles and electromagnetic parameters from the ITU-R database. This approach generates radar point clouds with consistent geometric and electromagnetic characteristics without manual modeling. Evaluated on real indoor scenes, a strategy of pretraining on synthetic data followed by fine-tuning on real data significantly improves radar-based 3D object detection, achieving up to a 3.7-point gain in 3D AP (IoU=0.3), primarily due to enhanced spatial localization accuracy.
📝 Abstract
Millimeter-wave (mmWave) radar provides reliable perception in visually degraded indoor environments (e.g., smoke, dust, and low light), but learning-based radar perception is bottlenecked by the scarcity and cost of collecting and annotating large-scale radar datasets. We present Sim2Radar, an end-to-end framework that synthesizes training radar data directly from single-view RGB images, enabling scalable data generation without manual scene modeling. Sim2Radar reconstructs a material-aware 3D scene by combining monocular depth estimation, segmentation, and vision-language reasoning to infer object materials, then simulates mmWave propagation with a configurable physics-based ray tracer using Fresnel reflection models parameterized by ITU-R electromagnetic properties. Evaluated on real-world indoor scenes, Sim2Radar improves downstream 3D radar perception via transfer learning: pre-training a radar point-cloud object detection model on synthetic data and fine-tuning on real radar yields up to +3.7 3D AP (IoU 0.3), with gains driven primarily by improved spatial localization. These results suggest that physics-based, vision-driven radar simulation can provide effective geometric priors for radar learning and measurably improve performance under limited real-data supervision.