OpenConstruction: A Systematic Synthesis of Open Visual Datasets for Data-Centric Artificial Intelligence in Construction Monitoring

📅 2025-08-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current publicly available computer vision datasets in the construction domain suffer from significant limitations in scale, modality diversity, annotation quality, and representativeness of real-world scenarios; moreover, they lack a systematic classification and evaluation framework, hindering the reliability and scalability of AI applications. To address this, we conduct a comprehensive survey of 51 publicly accessible construction-related vision datasets published between 2005 and 2024, encompassing RGB images, videos, and point clouds. We propose the first FAIR (Findable, Accessible, Interoperable, Reusable)-oriented classification framework tailored for construction monitoring, featuring a structured schema that enables multi-dimensional annotation characterization and application mapping. Furthermore, we develop and open-source OpenConstruction—a standardized, reproducible data catalog supporting key intelligent construction tasks such as progress tracking and safety monitoring. This work advances a data-centric paradigm for AI in construction.

Technology Category

Application Category

📝 Abstract
The construction industry increasingly relies on visual data to support Artificial Intelligence (AI) and Machine Learning (ML) applications for site monitoring. High-quality, domain-specific datasets, comprising images, videos, and point clouds, capture site geometry and spatiotemporal dynamics, including the location and interaction of objects, workers, and materials. However, despite growing interest in leveraging visual datasets, existing resources vary widely in sizes, data modalities, annotation quality, and representativeness of real-world construction conditions. A systematic review to categorize their data characteristics and application contexts is still lacking, limiting the community's ability to fully understand the dataset landscape, identify critical gaps, and guide future directions toward more effective, reliable, and scalable AI applications in construction. To address this gap, this study conducts an extensive search of academic databases and open-data platforms, yielding 51 publicly available visual datasets that span the 2005-2024 period. These datasets are categorized using a structured data schema covering (i) data fundamentals (e.g., size and license), (ii) data modalities (e.g., RGB and point cloud), (iii) annotation frameworks (e.g., bounding boxes), and (iv) downstream application domains (e.g., progress tracking). This study synthesizes these findings into an open-source catalog, OpenConstruction, supporting data-driven method development. Furthermore, the study discusses several critical limitations in the existing construction dataset landscape and presents a roadmap for future data infrastructure anchored in the Findability, Accessibility, Interoperability, and Reusability (FAIR) principles. By reviewing the current landscape and outlining strategic priorities, this study supports the advancement of data-centric solutions in the construction sector.
Problem

Research questions and friction points this paper is trying to address.

Lack of systematic review for construction visual datasets
Variability in dataset quality and representativeness limits AI applications
Need for structured categorization and FAIR-compliant data infrastructure
Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic review of 51 open visual datasets
Categorization using structured data schema
Open-source catalog for data-driven development
🔎 Similar Papers
No similar papers found.
R
Ruoxin Xiong
College of Architecture & Environmental Design, Kent State University
Y
Yanyu Wang
Bert S. Turner Department of Construction Management, Louisiana State University
Jiannan Cai
Jiannan Cai
Assistant Professor, The University of Texas at San Antonio
Construction Automation and RoboticsHuman-Robot CollaborationComputer VisionData AnalyticsSensing
K
Kaijian Liu
Department of Civil, Environmental, and Ocean Engineering, Stevens Institute of Technology
Y
Yuansheng Zhu
Department of Computing and Information Sciences, Rochester Institute of Technology
P
Pingbo Tang
Department of Civil and Environmental Engineering, Carnegie Mellon University
N
Nora El-Gohary
Department of Civil and Environmental Engineering, University of Illinois at Urbana-Champaign