UAV as Urban Construction Change Monitor: A New Benchmark and Change Captioning Model

📅 2026-05-05
📈 Citations: 0
Influential: 0
📄 PDF

career value

175K/year
📝 Abstract
Remote Sensing Image Change Captioning (RSICC) aims to generate spatially grounded natural language descriptions of scene evolution from bi-temporal imagery, moving beyond binary change masks toward semantic-level understanding. However, existing methods rely on implicit feature differencing without explicitly modeling structured change semantics, and struggle to reconcile the conflicting representation demands of change detection and caption generation. In addition, current benchmarks provide limited coverage of high-resolution urban construction scenarios. To address these challenges, we propose PTNet, a prototype-guided task-adaptive framework for joint change captioning and detection. PTNet explicitly models structured change semantics through a learnable prototype bank that guides cross-temporal interaction, disentangles task-specific representations via multi-head gating, and injects detection-derived spatial priors into caption generation, enabling coherent semantic correspondence while preserving fine-grained spatial sensitivity. Furthermore, we construct UCCD, a large-scale UAV-based benchmark comprising 9,000 high-resolution image pairs and 45,000 annotated sentences for urban construction monitoring. Extensive experiments on UCCD and WHU-CDC demonstrate that PTNet consistently outperforms existing methods. The dataset and source code are publicly available at https://github.com/G124556/ptnet.
Problem

Research questions and friction points this paper is trying to address.

Remote Sensing Image Change Captioning
Urban Construction Change
Change Detection
Semantic Understanding
UAV Monitoring
Innovation

Methods, ideas, or system contributions that make the work stand out.

prototype-guided framework
structured change semantics
multi-head gating
spatial priors injection
UAV-based benchmark