ROADWork Dataset: Learning to Recognize, Observe, Analyze and Drive Through Work Zones

📅 2024-06-11
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Autonomous driving faces significant challenges in perception and navigation within long-tail dynamic scenarios such as construction zones, where existing datasets are scarce and model performance remains inadequate. To address this, we introduce ConstruDrive—the first open-source, multimodal dataset specifically designed for construction zone perception and navigation. We systematically define a five-task benchmark: object detection, construction zone discovery, traffic sign recognition and OCR-based reading, semantic description, and path planning. Leveraging real-world navigation videos, ConstruDrive provides joint annotations including bounding boxes, natural-language descriptions, drivable paths, and navigation goals. Our end-to-end understanding and navigation evaluation framework integrates enhanced modules for object detection, OCR, image captioning, and motion prediction. Experiments demonstrate substantial improvements: +26.2 mAP in detection; 12.8× higher construction zone discovery rate; +23.9 mAP and +14.2% accuracy in sign recognition and reading; +36.7 SPICE score in description quality; 53.6% of navigation angular errors <0.5°; and 75.3% of path angular errors <0.5°.

Technology Category

Application Category

📝 Abstract
Perceiving and navigating through work zones is challenging and under-explored, even with major strides in self-driving research. An important reason is the lack of open datasets for developing new algorithms to address this long-tailed scenario. We propose the ROADWork dataset to learn how to recognize, observe and analyze and drive through work zones. We find that state-of-the-art foundation models perform poorly on work zones. With our dataset, we improve upon detecting work zone objects (+26.2 AP), while discovering work zones with higher precision (+32.5%) at a much higher discovery rate (12.8 times), significantly improve detecting (+23.9 AP) and reading (+14.2% 1-NED) work zone signs and describing work zones (+36.7 SPICE). We also compute drivable paths from work zone navigation videos and show that it is possible to predict navigational goals and pathways such that 53.6% goals have angular error (AE)<0.5 degrees (+9.9 %) and 75.3% pathways have AE<0.5 degrees (+8.1 %).
Problem

Research questions and friction points this paper is trying to address.

Recognizing and navigating work zones autonomously
Improving perception in work zones via fine-tuning
Enhancing vision-language models for work zone descriptions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning models improve work zone perception
Video label propagation boosts segmentation accuracy
Composing detectors reduces VLM hallucinations significantly
🔎 Similar Papers
No similar papers found.
Anurag Ghosh
Anurag Ghosh
Robotics Institute, Carnegie Mellon University
Computer VisionMachine LearningSystemsRobotics
R
Robert Tamburo
Carnegie Mellon University
Shen Zheng
Shen Zheng
Research Scientist, Bytedance Seed
Large Language Model
J
Juan R. Alvarez-Padilla
Carnegie Mellon University
H
Hailiang Zhu
Carnegie Mellon University
Michael Cardei
Michael Cardei
University of Virginia
Generative AIDiffusion ModelsMachine Learning
N
Nicholas Dunn
Carnegie Mellon University
Christoph Mertz
Christoph Mertz
Carnegie Mellon University
S
Srinivasa G. Narasimhan
Carnegie Mellon University