🤖 AI Summary
To address the challenges of high noise levels and difficulty in distinguishing static versus moving vehicles in single-frame automotive radar point clouds, this paper proposes the Radar Velocity Transformer (RVT), the first method achieving high-performance motion object segmentation using only single-frame radar data. Methodologically, RVT deeply integrates Doppler velocity features into all Transformer modules and introduces a self-attention-based adaptive upsampling structure to jointly model spatial coordinates and velocity information. We establish a new motion object segmentation benchmark on the RadarScenes dataset and train RVT end-to-end. Experiments demonstrate that RVT surpasses existing methods in segmentation accuracy while maintaining inference speed exceeding the radar frame rate (≥20 Hz), significantly enhancing real-time scene understanding. Key contributions include: (1) the first Transformer architecture specifically designed for single-frame radar motion segmentation; (2) a velocity-aware feature fusion and upsampling paradigm; and (3) a new benchmark with state-of-the-art performance.
📝 Abstract
The awareness about moving objects in the surroundings of a self-driving vehicle is essential for safe and reliable autonomous navigation. The interpretation of LiDAR and camera data achieves exceptional results but typically requires to accumulate and process temporal sequences of data in order to extract motion information. In contrast, radar sensors, which are already installed in most recent vehicles, can overcome this limitation as they directly provide the Doppler velocity of the detections and, hence incorporate instantaneous motion information within a single measurement. % In this paper, we tackle the problem of moving object segmentation in noisy radar point clouds. We also consider differentiating parked from moving cars, to enhance scene understanding. Instead of exploiting temporal dependencies to identify moving objects, we develop a novel transformer-based approach to perform single-scan moving object segmentation in sparse radar scans accurately. The key to our Radar Velocity Transformer is to incorporate the valuable velocity information throughout each module of the network, thereby enabling the precise segmentation of moving and non-moving objects. Additionally, we propose a transformer-based upsampling, which enhances the performance by adaptively combining information and overcoming the limitation of interpolation of sparse point clouds. Finally, we create a new radar moving object segmentation benchmark based on the RadarScenes dataset and compare our approach to other state-of-the-art methods. Our network runs faster than the frame rate of the sensor and shows superior segmentation results using only single-scan radar data.