STAND: Semantic Anchoring Constraint with Dual-Granularity Disambiguation for Remote Sensing Image Change Captioning

📅 2026-04-25
📈 Citations: 0
Influential: 0
📄 PDF

career value

156K/year
🤖 AI Summary
This work addresses the challenge of inaccurate change description in remote sensing imagery caused by multiple sources of ambiguity, including viewpoint variation, scale inconsistency, and insufficient semantic priors. To resolve these ambiguities in an interpretable manner, the authors propose a disambiguation framework that purifies temporal features through semantic anchoring constraints, integrates global context and frequency-domain refocused attention via a macro–micro dual-granularity mechanism, and leverages linguistic category priors to guide the decoding process. This systematic approach effectively mitigates multi-source ambiguities and achieves state-of-the-art performance on several benchmark datasets for remote sensing change description, significantly enhancing both descriptive accuracy and semantic consistency.

Technology Category

Application Category

📝 Abstract
Remote sensing image change captioning (RSICC) aims to describe the difference between two remote sensing images. While recent methods have explored video modeling, they largely overlook the inherent ambiguities in viewpoint, scale, and prior knowledge, lacking effective constraints on the encoder. In this paper, we present STAND, a Semantic Anchoring Constraint with Dual-Granularity Disambiguation for RSICC, to progressively resolve these ambiguities. Specifically, to establish a reliable feature foundation, we first introduce an interpretable constraint to regularize temporal representations. Operating on these purified features, a dual-granularity disambiguation module resolves spatial uncertainties by coupling macro-level global context aggregation for viewpoint confusion with micro-level frequency-refocused attention for small-object scale enhancement. Ultimately, to translate these visually disambiguated features into precise text, a semantic concept anchoring module leverages language categorical priors to tackle knowledge ambiguity during decoding. Extensive experiments verify the superiority of STAND and its effectiveness in addressing ambiguities.
Problem

Research questions and friction points this paper is trying to address.

remote sensing image change captioning
ambiguity
viewpoint
scale
prior knowledge
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic Anchoring
Dual-Granularity Disambiguation
Remote Sensing Image Change Captioning
Temporal Representation Regularization
Frequency-Refocused Attention
🔎 Similar Papers
No similar papers found.