🤖 AI Summary
Autonomous driving faces significant challenges in perception and navigation within long-tail dynamic scenarios such as construction zones, where existing datasets are scarce and model performance remains inadequate. To address this, we introduce ConstruDrive—the first open-source, multimodal dataset specifically designed for construction zone perception and navigation. We systematically define a five-task benchmark: object detection, construction zone discovery, traffic sign recognition and OCR-based reading, semantic description, and path planning. Leveraging real-world navigation videos, ConstruDrive provides joint annotations including bounding boxes, natural-language descriptions, drivable paths, and navigation goals. Our end-to-end understanding and navigation evaluation framework integrates enhanced modules for object detection, OCR, image captioning, and motion prediction. Experiments demonstrate substantial improvements: +26.2 mAP in detection; 12.8× higher construction zone discovery rate; +23.9 mAP and +14.2% accuracy in sign recognition and reading; +36.7 SPICE score in description quality; 53.6% of navigation angular errors <0.5°; and 75.3% of path angular errors <0.5°.
📝 Abstract
Perceiving and navigating through work zones is challenging and under-explored, even with major strides in self-driving research. An important reason is the lack of open datasets for developing new algorithms to address this long-tailed scenario. We propose the ROADWork dataset to learn how to recognize, observe and analyze and drive through work zones. We find that state-of-the-art foundation models perform poorly on work zones. With our dataset, we improve upon detecting work zone objects (+26.2 AP), while discovering work zones with higher precision (+32.5%) at a much higher discovery rate (12.8 times), significantly improve detecting (+23.9 AP) and reading (+14.2% 1-NED) work zone signs and describing work zones (+36.7 SPICE). We also compute drivable paths from work zone navigation videos and show that it is possible to predict navigational goals and pathways such that 53.6% goals have angular error (AE)<0.5 degrees (+9.9 %) and 75.3% pathways have AE<0.5 degrees (+8.1 %).