π€ AI Summary
Existing road datasets lack multimodal annotations of real-scale small obstacles under adverse weather and lighting conditions, limiting the robustness of real-time detection. To address this, we introduce AVOIDβthe first driving dataset explicitly designed for real-time obstacle detection under challenging conditions including rain, fog, and low illumination. Each frame provides synchronized, co-registered RGB images, semantic maps, depth maps, raw and semantic LiDAR point clouds, and trajectory points. We establish a novel paradigm for multi-sensor synchronization and high-precision calibration within a unified visual domain, enabling a joint multi-task benchmark for semantic segmentation, depth estimation, and trajectory prediction. Constructed via high-fidelity simulation, AVOID integrates YOLO-based detectors and a custom multi-task U-Net architecture. Experiments demonstrate significantly improved obstacle detection robustness; the multi-task model outperforms single-task baselines by +3.2% in mIoU, β11.7% in depth RMSE, and notably enhances trajectory prediction accuracy.
π Abstract
Understanding road scenes for visual perception remains crucial for intelligent self-driving cars. In particular, it is desirable to detect unexpected small road hazards reliably in real-time, especially under varying adverse conditions (e.g., weather and daylight). However, existing road driving datasets provide large-scale images acquired in either normal or adverse scenarios only, and often do not contain the road obstacles captured in the same visual domain as for the other classes. To address this, we introduce a new dataset called AVOID, the Adverse Visual Conditions Dataset, for real-time obstacle detection collected in a simulated environment. AVOID consists of a large set of unexpected road obstacles located along each path captured under various weather and time conditions. Each image is coupled with the corresponding semantic and depth maps, raw and semantic LiDAR data, and waypoints, thereby supporting most visual perception tasks. We benchmark the results on high-performing real-time networks for the obstacle detection task, and also propose and conduct ablation studies using a comprehensive multi-task network for semantic segmentation, depth and waypoint prediction tasks.