Referring Remote Sensing Image Segmentation via Bidirectional Alignment Guided Joint Prediction

📅 2025-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing challenges in remote sensing image referring segmentation—including weak visual-linguistic modality alignment, poor localization of small objects, ambiguous object boundaries, and multi-scale interference—this paper proposes a novel cross-modal segmentation framework. Methodologically, it integrates multi-scale feature interaction, contrastive spatial correlation computation, and joint vision-language prediction. Key contributions include: (1) bidirectional spatial correlation modeling to enhance fine-grained vision-language alignment; (2) a target-background dual-stream decoder to improve discriminability between foreground and background; and (3) a dual-modality object learning strategy to strengthen semantic consistency. Evaluated on RefSegRS and RRSIS-D, the method achieves state-of-the-art performance with overall IoU scores of 80.57% and 79.23%, surpassing prior best methods by 3.76 and 1.44 percentage points, respectively; mean IoU improves by 5.37 and 1.84 points.

Technology Category

Application Category

📝 Abstract
Referring Remote Sensing Image Segmentation (RRSIS) is critical for ecological monitoring, urban planning, and disaster management, requiring precise segmentation of objects in remote sensing imagery guided by textual descriptions. This task is uniquely challenging due to the considerable vision-language gap, the high spatial resolution and broad coverage of remote sensing imagery with diverse categories and small targets, and the presence of clustered, unclear targets with blurred edges. To tackle these issues, we propose ours, a novel framework designed to bridge the vision-language gap, enhance multi-scale feature interaction, and improve fine-grained object differentiation. Specifically, ours introduces: (1) the Bidirectional Spatial Correlation (BSC) for improved vision-language feature alignment, (2) the Target-Background TwinStream Decoder (T-BTD) for precise distinction between targets and non-targets, and (3) the Dual-Modal Object Learning Strategy (D-MOLS) for robust multimodal feature reconstruction. Extensive experiments on the benchmark datasets RefSegRS and RRSIS-D demonstrate that ours achieves state-of-the-art performance. Specifically, ours improves the overall IoU (oIoU) by 3.76 percentage points (80.57) and 1.44 percentage points (79.23) on the two datasets, respectively. Additionally, it outperforms previous methods in the mean IoU (mIoU) by 5.37 percentage points (67.95) and 1.84 percentage points (66.04), effectively addressing the core challenges of RRSIS with enhanced precision and robustness.
Problem

Research questions and friction points this paper is trying to address.

Bridging vision-language gap in remote sensing
Enhancing multi-scale feature interaction
Improving fine-grained object differentiation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bidirectional Spatial Correlation alignment
Target-Background TwinStream Decoder
Dual-Modal Object Learning Strategy
🔎 Similar Papers
2024-09-20IEEE Transactions on Geoscience and Remote SensingCitations: 2
T
Tianxiang Zhang
Key Laboratory of Knowledge Automation for Industrial Processes, Ministry of Education, the School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
Z
Zhaokun Wen
School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
Bo Kong
Bo Kong
University of Kansas Medical Center, Rutgers University
Molecular BiologyBile Acid HomeostasisCholestasisNuclear Receptor
Kecheng Liu
Kecheng Liu
School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
Y
Yisi Zhang
School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
P
Peixian Zhuang
Key Laboratory of Knowledge Automation for Industrial Processes, Ministry of Education, the School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
J
Jiangyun Li
Key Laboratory of Knowledge Automation for Industrial Processes, Ministry of Education, the School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China; Shunde Graduate School of University of Science and Technology Beijing, China