ArchiLense: A Framework for Quantitative Analysis of Architectural Styles Based on Vision Large Language Models

📅 2025-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional architectural cultural research has long relied on expert subjectivity and literature reviews, suffering from regional bias, poor reproducibility, and inadequate characterization of visual features. To address these limitations, this paper proposes ArchiLense—a novel vision-language co-analytic framework for architecture. It introduces ArchDiffBench, the first high-quality, multi-source benchmark dataset for architectural style analysis. ArchiLense integrates state-of-the-art vision-language models (e.g., CLIP, Qwen-VL) with backbone architectures (e.g., ViT, ResNet) to enable automated architectural image recognition, cross-temporal-spatial style comparison, fine-grained classification, and generation of interpretable stylistic descriptions. Experimental results demonstrate 92.4% agreement between model-predicted and expert-annotated styles, and an 84.5% classification accuracy—substantially enhancing objectivity, quantification, and scalability in architectural style analysis.

Technology Category

Application Category

📝 Abstract
Architectural cultures across regions are characterized by stylistic diversity, shaped by historical, social, and technological contexts in addition to geograph-ical conditions. Understanding architectural styles requires the ability to describe and analyze the stylistic features of different architects from various regions through visual observations of architectural imagery. However, traditional studies of architectural culture have largely relied on subjective expert interpretations and historical literature reviews, often suffering from regional biases and limited ex-planatory scope. To address these challenges, this study proposes three core contributions: (1) We construct a professional architectural style dataset named ArchDiffBench, which comprises 1,765 high-quality architectural images and their corresponding style annotations, collected from different regions and historical periods. (2) We propose ArchiLense, an analytical framework grounded in Vision-Language Models and constructed using the ArchDiffBench dataset. By integrating ad-vanced computer vision techniques, deep learning, and machine learning algo-rithms, ArchiLense enables automatic recognition, comparison, and precise classi-fication of architectural imagery, producing descriptive language outputs that ar-ticulate stylistic differences. (3) Extensive evaluations show that ArchiLense achieves strong performance in architectural style recognition, with a 92.4% con-sistency rate with expert annotations and 84.5% classification accuracy, effec-tively capturing stylistic distinctions across images. The proposed approach transcends the subjectivity inherent in traditional analyses and offers a more objective and accurate perspective for comparative studies of architectural culture.
Problem

Research questions and friction points this paper is trying to address.

Overcoming subjective expert biases in architectural style analysis
Automating recognition and classification of diverse architectural styles
Providing objective quantitative analysis across regions and periods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Professional architectural style dataset ArchDiffBench
Vision-Language Models framework ArchiLense
Automatic recognition and classification of architecture
🔎 Similar Papers
No similar papers found.
J
Jing Zhong
J
Jun Yin
Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
Peilin Li
Peilin Li
National University of Singapore
Machine LearningArchitectureGenerative Design
Pengyu Zeng
Pengyu Zeng
清华大学
人工智能、深度学习
M
Miao Zhang
S
Shuai Lu
R
Ran Luo