OpenFACADES: An Open Framework for Architectural Caption and Attribute Data Enrichment via Street View Imagery

📅 2025-04-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Urban building attribute data (e.g., height, function, material) are scarce, hindering applications such as energy simulation and risk assessment. To address this, we propose OpenFACADES—an open, end-to-end building perception framework that pioneers the integration of equal-field-of-view spatial matching, panoramic façade reprojection, and fine-tuned open-source vision-language models (VLMs). It enables joint multi-attribute prediction and open-vocabulary semantic description generation. Leveraging heterogeneous open data—including street-level imagery, Mapillary, and OpenStreetMap—the framework automates façade extraction and performs structured attribute inference. Evaluated across seven cities using 30,180 annotated images, our fine-tuned VLM achieves significantly higher accuracy in joint multi-attribute prediction than both specialized single-task computer vision models and zero-shot ChatGPT-4o. OpenFACADES establishes a scalable, reproducible data infrastructure for urban spatial analysis.

Technology Category

Application Category

📝 Abstract
Building properties, such as height, usage, and material composition, play a crucial role in spatial data infrastructures, supporting applications such as energy simulation, risk assessment, and environmental modeling. Despite their importance, comprehensive and high-quality building attribute data remain scarce in many urban areas. Recent advances have enabled the extraction and tagging of objective building attributes using remote sensing and street-level imagery. However, establishing a method and pipeline that integrates diverse open datasets, acquires holistic building imagery at scale, and infers comprehensive building attributes remains a significant challenge. Among the first, this study bridges the gaps by introducing OpenFACADES, an open framework that leverages multimodal crowdsourced data to enrich building profiles with both objective attributes and semantic descriptors through multimodal large language models. Our methodology proceeds in three major steps. First, we integrate street-level image metadata from Mapillary with OpenStreetMap geometries via isovist analysis, effectively identifying images that provide suitable vantage points for observing target buildings. Second, we automate the detection of building facades in panoramic imagery and tailor a reprojection approach to convert objects into holistic perspective views that approximate real-world observation. Third, we introduce an innovative approach that harnesses and systematically investigates the capabilities of open-source large vision-language models (VLMs) for multi-attribute prediction and open-vocabulary captioning in building-level analytics, leveraging a globally sourced dataset of 30,180 labeled images from seven cities. Evaluation shows that fine-tuned VLM excel in multi-attribute inference, outperforming single-attribute computer vision models and zero-shot ChatGPT-4o.
Problem

Research questions and friction points this paper is trying to address.

Lack of comprehensive building attribute data in urban areas
Challenges in integrating diverse datasets for building analytics
Need for scalable methods to infer building attributes from imagery
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates street view and OpenStreetMap via isovist analysis
Automates facade detection and reprojects panoramic imagery
Uses vision-language models for multi-attribute building analytics