ArchiLense: A Framework for Quantitative Analysis of Architectural Styles Based on Vision Large Language Models

📅 2025-06-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Traditional architectural cultural research has long relied on expert subjectivity and literature reviews, suffering from regional bias, poor reproducibility, and inadequate characterization of visual features. To address these limitations, this paper proposes ArchiLense—a novel vision-language co-analytic framework for architecture. It introduces ArchDiffBench, the first high-quality, multi-source benchmark dataset for architectural style analysis. ArchiLense integrates state-of-the-art vision-language models (e.g., CLIP, Qwen-VL) with backbone architectures (e.g., ViT, ResNet) to enable automated architectural image recognition, cross-temporal-spatial style comparison, fine-grained classification, and generation of interpretable stylistic descriptions. Experimental results demonstrate 92.4% agreement between model-predicted and expert-annotated styles, and an 84.5% classification accuracy—substantially enhancing objectivity, quantification, and scalability in architectural style analysis.

Technology Category

Application Category

📝 Abstract

Architectural cultures across regions are characterized by stylistic diversity, shaped by historical, social, and technological contexts in addition to geograph-ical conditions. Understanding architectural styles requires the ability to describe and analyze the stylistic features of different architects from various regions through visual observations of architectural imagery. However, traditional studies of architectural culture have largely relied on subjective expert interpretations and historical literature reviews, often suffering from regional biases and limited ex-planatory scope. To address these challenges, this study proposes three core contributions: (1) We construct a professional architectural style dataset named ArchDiffBench, which comprises 1,765 high-quality architectural images and their corresponding style annotations, collected from different regions and historical periods. (2) We propose ArchiLense, an analytical framework grounded in Vision-Language Models and constructed using the ArchDiffBench dataset. By integrating ad-vanced computer vision techniques, deep learning, and machine learning algo-rithms, ArchiLense enables automatic recognition, comparison, and precise classi-fication of architectural imagery, producing descriptive language outputs that ar-ticulate stylistic differences. (3) Extensive evaluations show that ArchiLense achieves strong performance in architectural style recognition, with a 92.4% con-sistency rate with expert annotations and 84.5% classification accuracy, effec-tively capturing stylistic distinctions across images. The proposed approach transcends the subjectivity inherent in traditional analyses and offers a more objective and accurate perspective for comparative studies of architectural culture.

Problem

Research questions and friction points this paper is trying to address.

Overcoming subjective expert biases in architectural style analysis

Automating recognition and classification of diverse architectural styles

Providing objective quantitative analysis across regions and periods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Professional architectural style dataset ArchDiffBench

Vision-Language Models framework ArchiLense

Automatic recognition and classification of architecture

🔎 Similar Papers

No similar papers found.

Authors to Follow