MMUEChange: A generalized LLM agent framework for intelligent multi-modal urban environment change analysis

📅 2026-01-09
🏛️ Applied Soft Computing
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes the first modular multimodal large language model (LLM) agent framework tailored for urban change analysis, addressing the limitations of existing approaches that rely on single-modality inputs and rigid pipelines, which struggle to effectively integrate heterogeneous multisource data. The framework introduces a modality controller to enable dynamic intra- and cross-modal alignment, flexibly incorporating remote sensing imagery, nighttime light data, and textual information while mitigating LLM hallucinations. Evaluated on real-world urban case studies, the approach achieves a 46.7% improvement in task success rate over the strongest baseline, significantly enhancing semantic understanding, reasoning capabilities, and policy relevance. It successfully uncovers complex urban dynamics, including green space transformation in New York, water pollution diffusion in Hong Kong, and landfill evolution in Shenzhen.

Technology Category

Application Category

Problem

Research questions and friction points this paper is trying to address.

urban environment change
multi-modal analysis
remote sensing change detection
heterogeneous urban data
sustainable development
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-modal agent
urban environment change
modality alignment
LLM framework
heterogeneous data integration
🔎 Similar Papers
No similar papers found.
Z
Zixuan Xiao
Department of Urban Planning and Design, The University of Hong Kong, Hong Kong
J
Jun Ma
Department of Urban Planning and Design, The University of Hong Kong, Hong Kong
Siwei Zhang
Siwei Zhang
ETH Zurich
3D human pose estimationhuman-scene interactions