π€ AI Summary
This work addresses the lack of synchronized, well-annotated open-source datasets for neuromorphic vision and radar multimodal fusion. To this end, we present the first multimodal dataset that integrates event streams from a Dynamic Vision Sensor (DVS), dual-band (24 GHz and 77 GHz) radar, and RGB-D camera data, all temporally aligned. Collected in office environments, the dataset encompasses 16 object categories and provides large-scale, time-synchronized annotations in COCO format. A dedicated DVSβradar subset is also established to facilitate object detection and ranging evaluation. Experimental results demonstrate that fusing DVS with 77 GHz radar achieves a mean average precision (mAP) of 47.5% in human detection, while radar-based distance estimation yields an average absolute error below 1.8 meters when benchmarked against LiDAR ground truth, confirming the datasetβs effectiveness and advancement for multimodal perception research.
π Abstract
We present NERVE (Neuromorphic Vision and Radar Ensemble), a multi-sensor dataset comprising 257 minutes of synchronized recordings from five sensors: two Dynamic Vision Sensors (DVS), an RGB-D camera, and two Radar units (24GHz and 77GHz). Captured across 12 measurement days in office environments, NERVE contains around 600GB of uncompressed temporally aligned data with around 914,000 frames and around 9.6 million RGB COCO-formatted annotations covering 16 relevant object categories. To evaluate multi-modal fusion, we construct a DVS+Radar subset for human detection and distance estimation. Baseline experiments using feed-forward and recurrent detectors show that combining DVS with 77GHz Radar consistently improves detection, with recurrent models achieving up to 47.5% mAP and mean absolute Radar distance errors below 1.8m against LiDAR ground truth.