🤖 AI Summary
This study addresses critical navigation safety risks faced by visually impaired individuals in dynamic urban construction environments—including uneven terrain, temporary obstacles, hazardous materials, and abrupt path changes—by proposing the first multimodal real-time perception and assistive decision-making framework tailored to construction sites. Methodologically, it introduces a novel collaborative architecture integrating open-vocabulary object detection (GLIP/OWL-ViT), a lightweight customized YOLOv8 model specialized for scaffold pole detection, and PaddleOCR-based text parsing, augmented with multi-scale geometric calibration and angle-robustness optimization to overcome generalization bottlenecks arising from highly diverse and irregular construction objects. Evaluated across seven real-world construction sites under static conditions, the framework achieves an overall detection accuracy of 88.56%; perfect recall (100%) within 2–4 meters; and robust performance across a wide field of view (0°–75°) and effective detection range (2–10 meters).
📝 Abstract
Navigating urban environments poses significant challenges for people with disabilities, particularly those with blindness and low vision. Environments with dynamic and unpredictable elements like construction sites are especially challenging. Construction sites introduce hazards like uneven surfaces, obstructive barriers, hazardous materials, and excessive noise, and they can alter routing, complicating safe mobility. Existing assistive technologies are limited, as navigation apps do not account for construction sites during trip planning, and detection tools that attempt hazard recognition struggle to address the extreme variability of construction paraphernalia. This study introduces a novel computer vision-based system that integrates open-vocabulary object detection, a YOLO-based scaffolding-pole detection model, and an optical character recognition (OCR) module to comprehensively identify and interpret construction site elements for assistive navigation. In static testing across seven construction sites, the system achieved an overall accuracy of 88.56%, reliably detecting objects from 2m to 10m within a 0$^circ$ -- 75$^circ$ angular offset. At closer distances (2--4m), the detection rate was 100% at all tested angles. At