Are Open-Vocabulary Models Ready for Detection of MEP Elements on Construction Sites

📅 2025-01-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the need for accurate small-object detection of MEP (mechanical, electrical, and plumbing) components by ground robots on construction sites. Method: We systematically compare open-vocabulary vision-language models—specifically CLIP-based zero-shot detection—with lightweight closed-set detectors (YOLOv5/v8 fine-tuned on domain-specific data). Our evaluation is grounded in a manually annotated image dataset captured by ground robots in real construction environments—the first empirical benchmark of open-vocabulary models for professional small-object detection. Results: Domain-adapted lightweight detectors achieve significantly higher mAP than open-vocabulary approaches (+23.6%), demonstrating that task-specific optimization remains indispensable under edge-computing constraints. Crucially, we identify domain adaptation as a key performance bottleneck for open-vocabulary models in specialized industrial settings. Our work establishes a reproducible benchmark and practical deployment pathway for small-object recognition in construction automation.

Technology Category

Application Category

📝 Abstract
The construction industry has long explored robotics and computer vision, yet their deployment on construction sites remains very limited. These technologies have the potential to revolutionize traditional workflows by enhancing accuracy, efficiency, and safety in construction management. Ground robots equipped with advanced vision systems could automate tasks such as monitoring mechanical, electrical, and plumbing (MEP) systems. The present research evaluates the applicability of open-vocabulary vision-language models compared to fine-tuned, lightweight, closed-set object detectors for detecting MEP components using a mobile ground robotic platform. A dataset collected with cameras mounted on a ground robot was manually annotated and analyzed to compare model performance. The results demonstrate that, despite the versatility of vision-language models, fine-tuned lightweight models still largely outperform them in specialized environments and for domain-specific tasks.
Problem

Research questions and friction points this paper is trying to address.

Image Recognition
Construction Sites
MEP Components
Innovation

Methods, ideas, or system contributions that make the work stand out.

Specialized Image Recognition Models
MEP Components Identification
Site-specific Accuracy Improvement
🔎 Similar Papers
No similar papers found.
A
Abdalwhab Abdalwhab
Lab INIT Robots, Department of Mechanical Engineering, ETS Montreal, Canada
Ali Imran
Ali Imran
Doctoral Candidate in Robotics, Lab INIT Robots, ETS Montreal
RoboticsMulti-robot SystemsSwarm RoboticsHuman robot interaction
S
Sina Heydarian
GRIDD, Department of Construction Engineering, ETS Montreal, Canada
I
I. Iordanova
GRIDD, Department of Construction Engineering, ETS Montreal, Canada
David St-Onge
David St-Onge
École de Technologie Supérieure
mechatronicsdecentralized robotic systemsrobotic arthuman-robot interactionairship design and control