Geospatial-Reasoning-Driven Vocabulary-Agnostic Remote Sensing Semantic Segmentation

πŸ“… 2026-02-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge that existing open-vocabulary remote sensing semantic segmentation methods struggle to distinguish spectrally similar yet semantically distinct land cover types due to a lack of geospatial contextual awareness. To overcome this limitation, we propose the Geospatial Reasoning Chain-of-Thought (GR-CoT) framework, which introduces geospatial contextual reasoning into this task for the first time. GR-CoT dynamically constructs image-adaptive vocabularies through scene anchoring, feature disentanglement, and knowledge-driven decision-making, synergistically combining offline knowledge distillation with online instance-level reasoning to guide pixel-wise semantic alignment. By integrating multimodal large language models, vision–text alignment, and a geospatial reasoning chain, our method achieves significant performance gains over state-of-the-art approaches on the LoveDA and GID-5 benchmarks, notably improving segmentation accuracy for visually ambiguous land cover classes.

Technology Category

Application Category

πŸ“ Abstract
Open-vocabulary semantic segmentation has emerged as a promising research direction in remote sensing, enabling the recognition of diverse land-cover types beyond pre-defined category sets. However, existing methods predominantly rely on the passive mapping of visual features and textual embeddings. This ``appearance-based"paradigm lacks geospatial contextual awareness, leading to severe semantic ambiguity and misclassification when encountering land-cover classes with similar spectral features but distinct semantic attributes. To address this, we propose a Geospatial Reasoning Chain-of-Thought (GR-CoT) framework designed to enhance the scene understanding capabilities of Multimodal Large Language Models (MLLMs), thereby guiding open-vocabulary segmentation models toward precise mapping. The framework comprises two collaborative components: an offline knowledge distillation stream and an online instance reasoning stream. The offline stream establishes fine-grained category interpretation standards to resolve semantic conflicts between similar land-cover types. During online inference, the framework executes a sequential reasoning process involving macro-scenario anchoring, visual feature decoupling, and knowledge-driven decision synthesis. This process generates an image-adaptive vocabulary that guides downstream models to achieve pixel-level alignment with correct geographical semantics. Extensive experiments on the LoveDA and GID5 benchmarks demonstrate the superiority of our approach.
Problem

Research questions and friction points this paper is trying to address.

open-vocabulary semantic segmentation
geospatial reasoning
remote sensing
semantic ambiguity
land-cover classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Geospatial Reasoning
Open-vocabulary Semantic Segmentation
Multimodal Large Language Models
Knowledge Distillation
Chain-of-Thought Reasoning
πŸ”Ž Similar Papers
No similar papers found.
C
Chufeng Zhou
Wuhan University of Science and Technology, Wuhan, China
J
Jian Wang
Wuhan University of Science and Technology, Wuhan, China
X
Xinyuan Liu
Beijing University of Posts and Telecommunications, Beijing, China
Xiaokang Zhang
Xiaokang Zhang
Wuhan University, School of Artificial Intelligence
Artificial IntelligenceRemote Sensing