🤖 AI Summary
To address insufficient near-field perception in crowded, unstructured environments—particularly the challenge of detecting and understanding truncated or occluded obstacles—this paper introduces RoboSense, the first omnidirectional, multimodal egocentric navigation dataset for robotics. It comprises 133K synchronized multimodal samples (RGB, LiDAR, and fisheye), 1.4M 3D bounding boxes with instance IDs, and 216K expert-annotated navigation trajectories, covering full 360° field-of-view and modeling dynamic near-field obstacles. We propose the first near-field 3D matching criterion and evaluation metrics, formalizing six core egocentric navigation tasks. RoboSense provides 270× and 18× more near-field obstacle annotations than KITTI and nuScenes, respectively. Key technical innovations include RGB-LiDAR-fisheye sensor fusion, synchronized multi-sensor calibration, adaptive field-of-view configuration, and privacy-preserving data anonymization. Benchmarking demonstrates significant improvements in truncated/occluded obstacle detection and motion prediction performance.
📝 Abstract
Reliable embodied perception from an egocentric perspective is challenging yet essential for autonomous navigation technology of intelligent mobile agents. With the growing demand of social robotics, near-field scene understanding becomes an important research topic in the areas of egocentric perceptual tasks related to navigation in both crowded and unstructured environments. Due to the complexity of environmental conditions and difficulty of surrounding obstacles owing to truncation and occlusion, the perception capability under this circumstance is still inferior. To further enhance the intelligence of mobile robots, in this paper, we setup an egocentric multi-sensor data collection platform based on 3 main types of sensors (Camera, LiDAR and Fisheye), which supports flexible sensor configurations to enable dynamic sight of view from ego-perspective, capturing either near or farther areas. Meanwhile, a large-scale multimodal dataset is constructed, named RoboSense, to facilitate egocentric robot perception. Specifically, RoboSense contains more than 133K synchronized data with 1.4M 3D bounding box and IDs annotated in the full $360^{circ}$ view, forming 216K trajectories across 7.6K temporal sequences. It has $270 imes$ and $18 imes$ as many annotations of surrounding obstacles within near ranges as the previous datasets collected for autonomous driving scenarios such as KITTI and nuScenes. Moreover, we define a novel matching criterion for near-field 3D perception and prediction metrics. Based on RoboSense, we formulate 6 popular tasks to facilitate the future research development, where the detailed analysis as well as benchmarks are also provided accordingly. Data desensitization measures have been conducted for privacy protection.