Built Environment Reasoning from Remote Sensing Imagery Using Large Vision--Language Models

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This study addresses the efficient understanding of built environments in smart cities to support design recommendations, constructability assessment, land use analysis, and risk identification. It presents the first systematic investigation into the impact of multi-scale remote sensing imagery on semantic reasoning about built environments, integrating advanced vision-language foundation models such as InternVL and Qwen to develop a multimodal generative reasoning framework. Experimental results demonstrate significant improvements in semantic comprehension and intelligent recommendation capabilities. The work not only reveals notable differences in accuracy and reliability among various models when applied to remote sensing tasks but also establishes a novel paradigm for smart city analytics through multimodal generative reasoning.

📝 Abstract

This work investigates the use of large language models (LLMs) for tasks in smart cities. The core idea is to leverage remote sensing imagery to characterize the built environment, including design suggestions, constructability assessment, landuse patterns, and risk identification. We examine remote sensing imagery at multiple spatial scales as inputs for multimodal language modeling and evaluate their effects on built-environment-related reasoning. In addition, we compare state-of-the-art LLMs, including InternVL and Qwen, in terms of accuracy and reliability when generating built environment recommendations. The results demonstrate the potential of integrating remote sensing imagery with large language models to assist smart cities and decision-making.

Problem

Research questions and friction points this paper is trying to address.

Built Environment

Remote Sensing Imagery

Smart Cities

Land Use

Risk Identification

Innovation

Methods, ideas, or system contributions that make the work stand out.

large vision-language models

remote sensing imagery

built environment reasoning