MGCR-Net:Multimodal Graph-Conditioned Vision-Language Reconstruction Network for Remote Sensing Change Detection

šŸ“… 2025-08-03
šŸ“ˆ Citations: 0
✨ Influential: 0
šŸ“„ PDF
šŸ¤– AI Summary
To address insufficient multimodal information utilization in remote sensing change detection, this paper proposes a Multimodal Graph-conditioned Vision–Language Reconstruction network (MG-VLR). The method introduces vision–language cross-modal reconstruction—novel for remote sensing change detection—via a dual-encoder architecture that separately extracts semantic features from bitemporal images and generated descriptive texts. A graph attention–driven semantic graph-conditioned reconstruction module enables fine-grained cross-modal alignment, while multimodal feature interaction is achieved through integration of multi-head attention and a Language–Vision Transformer (LViT). Extensive experiments on four public benchmark datasets demonstrate that MG-VLR significantly outperforms state-of-the-art unimodal and multimodal methods, achieving consistent improvements in both detection accuracy and semantic interpretability.

Technology Category

Application Category

šŸ“ Abstract
With the advancement of remote sensing satellite technology and the rapid progress of deep learning, remote sensing change detection (RSCD) has become a key technique for regional monitoring. Traditional change detection (CD) methods and deep learning-based approaches have made significant contributions to change analysis and detection, however, many outstanding methods still face limitations in the exploration and application of multimodal data. To address this, we propose the multimodal graph-conditioned vision-language reconstruction network (MGCR-Net) to further explore the semantic interaction capabilities of multimodal data. Multimodal large language models (MLLM) have attracted widespread attention for their outstanding performance in computer vision, particularly due to their powerful visual-language understanding and dialogic interaction capabilities. Specifically, we design a MLLM-based optimization strategy to generate multimodal textual data from the original CD images, which serve as textual input to MGCR. Visual and textual features are extracted through a dual encoder framework. For the first time in the RSCD task, we introduce a multimodal graph-conditioned vision-language reconstruction mechanism, which is integrated with graph attention to construct a semantic graph-conditioned reconstruction module (SGCM), this module generates vision-language (VL) tokens through graph-based conditions and enables cross-dimensional interaction between visual and textual features via multihead attention. The reconstructed VL features are then deeply fused using the language vision transformer (LViT), achieving fine-grained feature alignment and high-level semantic interaction. Experimental results on four public datasets demonstrate that MGCR achieves superior performance compared to mainstream CD methods. Our code is available on https://github.com/cn-xvkong/MGCR
Problem

Research questions and friction points this paper is trying to address.

Explores multimodal data interaction for remote sensing change detection
Integrates vision-language reconstruction with graph attention mechanisms
Enhances semantic alignment via multimodal graph-conditioned features
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal graph-conditioned vision-language reconstruction network
MLLM-based optimization for textual data generation
Graph attention with semantic graph-conditioned reconstruction module
šŸ”Ž Similar Papers
No similar papers found.
C
Chengming Wang
School of Computer Science and Technology, Shandong Technology and Business University, Yantai 264005, China
Guodong Fan
Guodong Fan
Tianjin University
Service ComputingSoftware EngineeringLarge Language ModelsCombinatorial Optimization
J
Jinjiang Li
School of Computer Science and Technology, Shandong Technology and Business University, Yantai 264005, China
M
Min Gan
School of Computer Science and Technology, Qingdao University, Qingdao 266000, China
C
C. L. Philip Chen
School of Computer Science and Engineering, South China University of Technology, Guangzhou 510641, China