ArchSeek: Retrieving Architectural Case Studies Using Vision-Language Models

📅 2025-03-24

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

In architectural design, conventional text-based retrieval fails to capture visual semantics and complex spatial relationships, resulting in inefficient and inaccurate case retrieval. This paper introduces the first fine-grained vision-language cross-modal retrieval framework tailored for architecture, integrating a multi-scale vision-language model (an enhanced CLIP variant), cross-modal embedding alignment, query refinement, and interactive feedback learning. The framework supports dual-modality queries (text or image) and delivers interpretable design inspiration recommendations. Its key innovations include modeling design intent within cross-modal alignment and enabling user-driven iterative optimization. Evaluated with professional architects, the method reduces average retrieval time by 62% and achieves an 89.3% Top-5 retrieval accuracy—significantly improving both efficiency and relevance in architectural case acquisition.

Technology Category

Application Category

📝 Abstract

Efficiently searching for relevant case studies is critical in architectural design, as designers rely on precedent examples to guide or inspire their ongoing projects. However, traditional text-based search tools struggle to capture the inherently visual and complex nature of architectural knowledge, often leading to time-consuming and imprecise exploration. This paper introduces ArchSeek, an innovative case study search system with recommendation capability, tailored for architecture design professionals. Powered by the visual understanding capabilities from vision-language models and cross-modal embeddings, it enables text and image queries with fine-grained control, and interaction-based design case recommendations. It offers architects a more efficient, personalized way to discover design inspirations, with potential applications across other visually driven design fields. The source code is available at https://github.com/danruili/ArchSeek.

Problem

Research questions and friction points this paper is trying to address.

Enables efficient visual and text-based architectural case study retrieval

Addresses limitations of traditional text-only search in architecture

Provides personalized design recommendations using vision-language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-language models for architectural search

Cross-modal embeddings enable text-image queries

Interaction-based design case recommendations

🔎 Similar Papers

No similar papers found.