Edge-Optimized Vision-Language Models for Underground Infrastructure Assessment

📅 2026-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of automatically generating human-readable defect summaries from visual inspections of underground infrastructure—such as drainage pipes and culverts—on resource-constrained edge devices. The authors propose a two-stage, end-to-end lightweight pipeline: first, an efficient defect segmentation model, RAPID-SCAN, with only 0.64M parameters achieves an F1-score of 0.834; second, a quantized and fine-tuned Phi-3.5 vision-language model produces domain-specific natural language summaries. This study presents the first integration of lightweight segmentation with an edge-optimized vision-language model, introduces the first dedicated dataset with human-verified descriptive annotations, and demonstrates real-time inference on a mobile robotic platform, significantly enhancing both the interpretability and deployment efficiency of intelligent infrastructure assessment systems.

Technology Category

Application Category

📝 Abstract
Autonomous inspection of underground infrastructure, such as sewer and culvert systems, is critical to public safety and urban sustainability. Although robotic platforms equipped with visual sensors can efficiently detect structural deficiencies, the automated generation of human-readable summaries from these detections remains a significant challenge, especially on resource-constrained edge devices. This paper presents a novel two-stage pipeline for end-to-end summarization of underground deficiencies, combining our lightweight RAPID-SCAN segmentation model with a fine-tuned Vision-Language Model (VLM) deployed on an edge computing platform. The first stage employs RAPID-SCAN (Resource-Aware Pipeline Inspection and Defect Segmentation using Compact Adaptive Network), achieving 0.834 F1-score with only 0.64M parameters for efficient defect segmentation. The second stage utilizes a fine-tuned Phi-3.5 VLM that generates concise, domain-specific summaries in natural language from the segmentation outputs. We introduce a curated dataset of inspection images with manually verified descriptions for VLM fine-tuning and evaluation. To enable real-time performance, we employ post-training quantization with hardware-specific optimization, achieving significant reductions in model size and inference latency without compromising summarization quality. We deploy and evaluate our complete pipeline on a mobile robotic platform, demonstrating its effectiveness in real-world inspection scenarios. Our results show the potential of edge-deployable integrated AI systems to bridge the gap between automated defect detection and actionable insights for infrastructure maintenance, paving the way for more scalable and autonomous inspection solutions.
Problem

Research questions and friction points this paper is trying to address.

underground infrastructure
defect summarization
edge computing
vision-language model
autonomous inspection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Edge AI
Vision-Language Model
Lightweight Segmentation
Post-training Quantization
Autonomous Infrastructure Inspection
J
Johny J. Lopez
Canizaro Livingston Gulf States Center for Environmental Informatics, the University of New Orleans, New Orleans, USA
M
M. Ferdaus
Canizaro Livingston Gulf States Center for Environmental Informatics, the University of New Orleans, New Orleans, USA
Mahdi Abdelguerfi
Mahdi Abdelguerfi
Professor of Computer Science, University of New Orleans
Geospatial IntelligenceBig DataAI