🤖 AI Summary
Millimeter-wave (mmWave) radar offers low cost and high robustness but suffers from poor angular resolution in single-chip configurations, while the absence of standardized foundation models and large-scale datasets hinders high-precision perception in automotive and indoor scenarios. To address this, we propose GRT—the first general-purpose foundation model for single-chip mmWave radar. GRT processes raw radar signals using a Transformer architecture and leverages large-scale self-supervised pretraining (29 hours, >1 million samples) coupled with joint 4D occupancy–semantic modeling. We empirically demonstrate for the first time that training on raw data significantly outperforms compressed representations, and GRT enables effective cross-scenario transfer and multi-task fine-tuning. Experimental results show that GRT achieves 4D perception performance on par with high-resolution sensors—using only low-cost, single-chip radar hardware.
📝 Abstract
mmWave radars are compact, inexpensive, and durable sensors that are robust to occlusions and work regardless of environmental conditions, such as weather and darkness. However, this comes at the cost of poor angular resolution, especially for inexpensive single-chip radars, which are typically used in automotive and indoor sensing applications. Although many have proposed learning-based methods to mitigate this weakness, no standardized foundational models or large datasets for the mmWave radar have emerged, and practitioners have largely trained task-specific models from scratch using relatively small datasets.
In this paper, we collect (to our knowledge) the largest available raw radar dataset with 1M samples (29 hours) and train a foundational model for 4D single-chip radar, which can predict 3D occupancy and semantic segmentation with quality that is typically only possible with much higher resolution sensors. We demonstrate that our Generalizable Radar Transformer (GRT) generalizes across diverse settings, can be fine-tuned for different tasks, and shows logarithmic data scaling of 20% per $10 imes$ data. We also run extensive ablations on common design decisions, and find that using raw radar data significantly outperforms widely-used lossy representations, equivalent to a $10 imes$ increase in training data. Finally, we roughly estimate that $approx$100M samples (3000 hours) of data are required to fully exploit the potential of GRT.